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REAL PARTY IN INTEREST 

This application is currently owned by STMicroelectronics, Inc. as indicated by an 

assignment recorded on May 7, 2001 in the Assignment Records of the United States Patent and 
Trademark Office at Reel 01 1769, Frame 0344. 

RELATED APPEALS AND INTERFERENCES 

There are no known appeals or interferences that will directly affect, be directly affected by, 
or have a bearing on the Board's decision in this pending appeal. 

STATUS OF CLAIMS 

Claims 1-4, 6-14, and 16-22 have been rejected pursuant to a final Office Action dated April 
27, 2004. Claims 1-4, 6-14, and 16-22 are presented for appeal. A copy of the claims is provided in 
Appendix A. 

STATUS OF AMENDMENTS 

The Appellant filed an Amendment and Response Under 37 C.F.R. § 1.116 on June 25, 
2004 in response to the Office Action dated April 27, 2004. The Amendment and Response 
amended the specification to include serial numbers for ten related patent applications, which were 
requested by the Examiner. The Amendment and Response did not amend the claims. The 
Examiner refused to enter the Amendment and Response, asserting that it did not place the 
application in better form for appeal by materially reducing or simplifying the issues for appeal. 
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SUMMARY OF CLAIMED SUBJECT MATTER 

Regarding Claim 1, a data processor 100 includes an instruction execution pipeline 400. 

(Application, Page 24, Lines 20-22), The instruction execution pipeline 400 includes a read stage 

404, a write stage 407, and a first execution stage 405. (Application, Page 24, Line 22 - Page 25, 

Line 4), The first execution stage 405 includes E execution units capable of producing data results 

from data operands. (Application, Page 11, Lines 9-11; Page 29, Lines 4-7). The data processor 100 

also includes a register file 505 that includes a plurality of data registers. (Application, Page 11, 

Lines 11-12; Page 29, Lines 8-9). Each data register is capable of being read by the read stage 404 

via at least one of R read ports R0-R7 of the register file 505, and each data register is capable of 

being written by the write stage 407 via at least one of W write ports W0-W3 of the register file 505. 

(Application, Page 11, Lines 12-17; Page 29, Lines 8-9). In addition, the data processor 100 

includes bypass circuitry 500. (Application, Page 27, Line 22 - Page 28, Line 6). The bypass 

circuitry 500 is capable of receiving data results from output channels of source devices in at least 

one of the write stage 407 and the first execution stage 405. (Application, Page 28, Lines 6-9; Page 

28, Line 21 - Page 29, Line 1). The bypass circuitry 500 includes a first plurality of bypass tristate 

line drivers 51 1B-51 II, 512B-512I, 513B-513I, and 514B-514I. (Application, Page 28, Lines 6-8). 

The bypass tristate line drivers have input channels coupled to first output channels of a first plurality 

of the source devices, and the bypass tristate line drivers have tristate output channels coupled to a 

first common read data channel in the read stage 404. (Application, Page 11, Line 19- Page 12, Line 

2; Page 28; Lines 6-18). The bypass circuitry 500 also includes a first multiplexer 531 having a first 

input channel coupled to the first common read data channel and an output channel coupled to a first 
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operand channel of a first execution unit in the first execution stage 405. (Application, Page 12, Line 

19 - Page 13, Line 1; Page 28, Lines 19-21; Page 29, Lines 1-7), 

Regarding Claim 1 1, a processing system 10 includes a data processor 100, a memory 130 

coupled to the data processor 100, and a plurality of memory-mapped peripheral circuits 111-114 

coupled to the data processor 100 for performing selected functions in association with the data 

processor 1 00. (Application, Page 1 7, Line 9 - Page 18, Line 4). The data processor 1 00 includes an 

instruction execution pipeline 400. (Application, Page 24, Lines 20-22). The instruction execution 

pipeline 400 includes a read stage 404, a write stage 407, and a first execution stage 405. 

(Application, Page 24, Line 22 - Page 25, Line 4). The first execution stage 405 includes E 

execution units capable of producing data results from data operands. (Application, Page 11, Lines 9- 

11; Page 29, Lines 4-7). The data processor 100 also includes a register file 505 that includes a 

plurality of data registers. (Application, Page 11, Lines 11-12; Page 29, Lines 8-9). Each data 

register is capable of being read by the read stage 404 via at least one of R read ports R0-R7 of the 

register file 505, and each data register is capable of being written by the write stage 407 via at least 

one of W write ports W0-W3 of the register file 505. (Application, Page 11, Lines 12-1 7; Page 29, 

Lines 8-9). In addition, the data processor 100 includes bypass circuitry 500. (Application, Page 27, 

Line 22 - Page 28, Line 6). The bypass circuitry 500 is capable of receiving data results from output 

channels of source devices in at least one of the write stage 407 and the first execution stage 405. 

(Application, Page 28, Lines 6-9; Page 28, Line 21 - Page 29, Line 1). The bypass circuitry 500 

includes a first plurality of bypass tristate line drivers 5 1 1 B-5 1 1 1, 5 1 2B-5 1 21, 5 1 3B-5 1 31, and 5 1 4B- 

5 141. (Application, Page 28, Lines 6-8). The bypass tristate line drivers have input channels coupled 
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to first output channels of a first plurality of the source devices, and the bypass tristate line drivers 
have tristate output channels coupled to a first common read data channel in the read stage 404. 
{Application, Page 11, Line 19 - Page 12, Line 2; Page 28; Lines 6-18). The bypass circuitry 500 
also includes a first multiplexer 531 having a first input channel coupled to the first common read 
data channel and an output channel coupled to a first operand channel of a first execution unit in the 
first execution stage 405. (Application, Page 12, Line 19 -Page 13, Line 1; Page 28, Lines 19-21; 
Page 29, Lines 1-7). 

GROUNDS OF REJECTION 

1. Claims 1-4, 6-10, and 21 stand rejected under 35 U.S. C. § 102(b) as being anticipated 
by U.S. Patent No. 5,805,852. 

2. Claims 11-14, 16-20, and 22 stand rejected under 35 U.S.C. § 103(a) as being 
unpatentable over U.S. Patent No. 5,805,852 in view of U.S. Patent No. 4,591,973. 
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ARGUMENT 



I. GROUND OF REJECTION #1 (§ 102 REJECTION) 

The rejection of Claims 1-4, 6-10, and 21 under 35 U.S.C. § 102(b) is improper and should 
be withdrawn. 



A. OVERVIEW 

Claims 1-4, 6-10, and21 stand rejected under 35 U.S.C. § 102(b) as being anticipated by U.S. 
Patent No. 5,805,852 to Nakanishi ("Nakanishi"). 

A copy of Nakanishi is provided in Appendix B. 



B. STANDARD 

A prior art reference anticipates the claimed invention under 35 U.S.C. § 102 only if every 
element of a claimed invention is identically shown in that single reference, arranged as they are 
in the claims. (MPEP § 2131; In re Bond, 910 F.2d 831, 832, 15 U.S.P.Q.2d 1566, 1567 
(Fed. Cir. 1990)). Anticipation is only shown where each and every limitation of the claimed 
invention is found in a single prior art reference. (MPEP § 2131; In re Donohue, 766 F. 2d 531, 534, 
226 U.S.P.Q. 619, 621 (Fed Cir. 1985)). 



C. THE NAKANISHI REFERENCE 

Nakanishi recites a bypass control circuit for a processor. (Abstract). The bypass control 

circuit is capable of providing data from result buffers in execution stages (EX) and memory access 
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stages (MEM) of the processor to latch circuits. (Abstract; Figure J). The latch circuits then provide 
the data to arithmetic and logic units (ALU) of the processor. (Figure 3). The processor also 
includes multiple buses (elements 1-1 through 4-2) that provide data to the latch circuits in the 
processor. (Figure 3). Each bus is associated with a particular one of the latch circuits. (Figure 3). 
In addition, the processor includes tristate buffers (elements Tl through T72) that connect various 
sources of data to the buses. (Figure J). 



D. CLAIMS 1-4, 6-10, AND 21 

Claim 1 recites a data processor, which includes: 

an instruction execution pipeline comprising: 
a read stage; 
a write stage; and 

a first execution stage comprising E execution units 
capable of producing data results from data operands; 

a register file comprising a plurality of data registers, each of 
said data registers capable of being read by said read stage of said 
instruction pipeline via at least one of R read ports of said register file 
and each of said data registers capable of being written by said write 
stage of said instruction pipeline via at least one of W write ports of 
said register file; and 

bypass circuitry capable of receiving data results from output 
channels of source devices in at least one of said write stage and said 
first execution stage, said bypass circuitry comprising: 

a first plurality of bypass tristate line drivers having 
input channels coupled to first output channels of a first plurality of 
said source devices and tristate output channels coupled to a first 
common read data channel in said read stage; and 

a first multiplexer having a first input channel coupled 
to said first common read data channel and an output channel coupled 
to a first operand channel of a first execution unit in said first 
execution stage. 
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The Examiner asserts that Nakanishi anticipates "bypass circuitry" that includes both a 
"plurality of bypass tristate line drivers" and a "multiplexer" as recited in Claim 1 . {04/27/04 Office 
Action, Pages 3-4, Paragraphs 7c-7d). However, the elements of Nakanishi relied upon by the 
Examiner fail to anticipate the "bypass circuitry" recited in Claim 1 . 

The Examiner relies on the tristate buffers (elements T1-T72) of Nakanishi as anticipating the 
"plurality of bypass tristate line drivers" recited in Claim 1. (04/27/04 Office Action, Page 3, 
Paragraph 7 c). The Examiner also relies on the tristate buffers (elements T1-T72) of Nakanishi as 
anticipating the "multiplexer" recited in Claim 1. (04/27/04 Office Action, Pages 3-4, Paragraph 
7d). In addition, the Examiner relies on any of the buses (elements 1 - 1 through 4-2) of Nakanishi as 
anticipating the "common read data channel" recited in Claim 1 . (08/03/04 Advisory Action, Page 2, 
First paragraph). 

Claim 1 recites that the output channels of the bypass tristate line drivers are coupled to a 
"common read data channel." Claim 1 also recites that an input channel of the multiplexer is 
coupled to the "common read data channel" The Examiner relies on any of the buses (elements 1-1 
through 4-2) of Nakanishi as anticipating the "common read data channel" recited in Claim 1. 
Because of this, the Examiner must show that Nakanishi anticipates a "plurality of bypass tristate 
line drivers" having "output channels" coupled to one of the buses of Nakanishi and a "multiplexer" 
having an "input channel" coupled to one of the buses of Nakanishi. The Examiner cannot make this 
showing. 

Figure 3 of Nakanishi clearly shows that the tristate buffers (elements T1-T72) have outputs 

coupled to the buses (elements 1-1 through 4-2). Figure 3 also clearly shows that the inputs of the 
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tristate buffers are not coupled to the buses of Nakanishi. As a result, the tristate buffers of 
Nakanishi cannot anticipate a "multiplexer" having an "input channel" coupled to a "common read 
data channel" as recited in Claim 1 . This is because the tristate buffers of Nakanishi do not have any 
inputs coupled to any of the buses of Nakanishi, and the Examiner relies on the buses as anticipating 
the "common read data channel" recited in Claim 1. 

Moreover, Figure 3 also shows that the only components of Nakanishi that have inputs 
coupled to the buses are the latch circuits (elements L1-L8). The latch circuits of Nakanishi are not 
multiplexers. As a result, none of the latch circuits of Nakanishi anticipates a "multiplexer" as 
recited in Claim 1 . 

The Examiner argues that the tristate network of Nakanishi "performs the same function as 
the multiplexer." (04/27/04 Office Action, Page 4, Paragraph 7d). However, whether the tristate 
buffers of Nakanishi perform the function of a multiplexer is irrelevant. The issue is whether 
Nakanishi anticipates both a "plurality of bypass tristate line drivers" having "output channels" 
coupled to a "common read data channel" and a "multiplexer" having an "input channel" coupled to 
the "common read data channel." The Examiner cannot identify any "multiplexer" in Nakanishi that 
has an "input channel" coupled to a "common read data channel" as recited in Claim 1. 

For these reasons, the Examiner has failed to establish that Nakanishi anticipates all elements 
of Claim 1 (and its dependent claims). Accordingly, the Appellant respectfully requests that the final 
rejection of Claims 1-4, 6-10, and 21 be withdrawn and that Claims 1-4, 6-10, and 21 be passed to 
allowance. 
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II. GROUND OF REJECTION #2 (g 103 REJECTION) 

The rejection of Claims 11-14, 16-20, and 22 under 35 U.S.C. § 103(a) is improper and 
should be withdrawn. 

A. OVERVIEW 

Claims 1 1-14, 16-20, and 22 stand rejected under 35 U.S.C. § 103(a) as being unpatentable 
over Nakanishi in view of U.S. Patent No. 4,591,973 to Ferris, III et al. ("Ferris"). 
A copy of Ferris is provided in Appendix C. 

B. STANDARD 

In ex parte examination of patent applications, the Patent Office bears the burden of 

establishing a prima facie case of obviousness. (MPEP § 2142; In re Fritch, 972 F.2d 1260, 1262, 

23 U.S.P.Q.2d 1780, 1783 (Fed. Cir. 1992)). The initial burden of establishing a/jn'ma^rc/e basis to 

deny patentability to a claimed invention is always upon the Patent Office. {MPEP § 2142; In re 

Oetiker, 977 F.2d 1443, 1445, 24 U.S.P.Q.2d 1443, 1444 (Fed. Cir. 1992); In re Piasecki, 745 F.2d 

1468, 1472, 223 U.S.P.Q. 785, 788 (Fed. Cir. 1984)). Only when aprima facie case of obviousness 

is established does the burden shift to the Appellant to produce evidence of nonobviousness. (MPEP 

§ 2142; In re Oetiker, 977 F.2d 1443, 1445, 24 U.S.P.Q.2d 1443, 1444 (Fed. Cir. 1992); In re 

Rijckaert, 9F.3dl531, 1532, 28 U.S.P.Q.2d 1955, 1956 (Fed. Cir. 1993)). If the Patent Office does 

not produce a prima facie case of unpatentability, then without more the Appellant is entitled to grant 

of a patent. (In re Oetiker, 977 F. 2d 1443, 1445, 24 U.S.P.Q.2d 1443, 1444 (Fed. Cir. 1992); In re 
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Grabiak, 769 F. 2d 729, 733, 226 U.S.P.Q. 870, 873 (Fed, Cir. 1985)1 

A prima facie case of obviousness is established when the teachings of the prior art itself 
suggest the claimed subject matter to a person of ordinary skill in the art. {In re Bell, 991 F.2d 781, 
783, 26 U.S.P.Q.2d 1529, 1531 (Fed. Cir. 1993)). To establish & prima facie case of obviousness, 
three basic criteria must be met. First, there must be some suggestion or motivation, either in the 
references themselves or in the knowledge generally available to one of ordinary skill in the art, to 
modify the reference or to combine reference teachings. Second, there must be a reasonable 
expectation of success. Finally, the prior art reference (or references when combined) must teach or 
suggest all the claim limitations. The teaching or suggestion to make the claimed invention and the 
reasonable expectation of success must both be found in the prior art, and not based on Appellant's 
disclosure. (MPEP § 2142). 

C. THE FERRIS REFERENCE 

Ferris recites an input/output system for coupling a computer to a plurality of peripheral 
devices. {Abstract). The peripheral devices include digital-to-analog converters. {Col 1, Lines 11- 
13). 

D. CLAIMS 11-14, 16-20, AND 22 

Claim 1 1 recites a processing system, which includes: 

a data processor, wherein said data processor comprises: 
an instruction execution pipeline comprising: 
a read stage; 

-11- 
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a write stage; and 

a first execution stage comprising E execution 
units capable of producing data results from data operands; 

a register file comprising a plurality of data registers, 
each of said data registers capable of being read by said read stage of 
said instruction pipeline via at least one of R read ports of said 
register file and each of said data registers capable of being written by 
said write stage of said instruction pipeline via at least one of W write 
ports of said register file; and 

bypass circuitry capable of receiving data results from 
output channels of source devices in at least one of said write stage 
and said first execution stage, said bypass circuitry comprising: 

a first plurality of bypass tristate line drivers 
having input channels coupled to first output channels of a first 
plurality of said source devices and tristate output channels coupled to 
a first common read data channel in said read stage; and 

a first multiplexer having a first input channel 
coupled to said first common read data channel and an output channel 
coupled to a first operand channel of a first execution unit in said first 
execution stage; 

a memory coupled to said data processor; and 
a plurality of memory-mapped peripheral circuits coupled to 
said data processor for performing selected functions in association 
with said data processor. 

As described above, Nakanishi fails to anticipate the use of both a "plurality of bypass tristate 
line drivers" having "output channels" coupled to a "common read data channel" and a "multiplexer" 
having an "input channel" coupled to the "common read data channel" as recited in Claim 11. 

Moreover, Nakanishi fails to suggest the use of a "multiplexer" having an "input channel" 
coupled to the "common read data channel" as recited in Claim 1 1 . The Examiner relies on any of 
the buses (elements 1-1 through 4-2) of Nakanishi as anticipating the "common read data channel" 
recited in Claim 1 1 . (08/03/04 Advisory Action, Page 2, First paragraph). Also, the buses of 
Nakanishi are coupled to latch circuits (elements L1-L8). As a result, the burden is on the Examiner 
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to show that a person skilled in the art would modify Nakanishi to include a multiplexer between the 
buses and the latch circuits of Nakanishi, The Examiner cannot make this showing. 

The tristate buffers of Nakanishi are used to provide data from multiple sources to multiple 
buses, and each bus then provides data to a single latch circuit for processing. For example, tristate 
buffers T8, T16, T24, T32, T40, T48, T56, T64, and T72 are used to provide data from one of 
multiple sources to bus 4-2. {Figure 3). The bus 4-2 then provides the data to a single latch circuit 
L8, which provides the data to a single ALU a4. (Figure 3). Similarly, tristate buffers Tl, T9, T17, 
T25, T33, T41, T49, T57, and T65 are used to provide data from one of multiple sources to bus 1-1. 
(Figure 3) . The bus 1 - 1 then provides the data to a single latch circuit L 1 , which provides the data to 
a single ALU al . (Figure 3). 

It is clear here that no multiplexer is needed between the buses and the latch circuits of 
Nakanishi. In particular, each latch circuit of Nakanishi is only capable of receiving data from a 
single one of the buses. Because of this, there is no need for multiplexers between the buses and 
latch circuits of Nakanishi because each latch circuit receives data from a different bus. As a result, 
a person skilled in the art would not modify Nakanishi to include multiplexers between the buses and 
latch circuits of Nakanishi. 

The Examiner relies on Ferris only as allegedly disclosing the use of "peripheral circuits" as 

recited in Claim 11. (04/27/04 Office Action, Page 8, Paragraphs 20-21). The Examiner does not 

rely on Ferris as disclosing, teaching, or suggesting the use of both a "plurality of bypass tristate line 

drivers" having "output channels" coupled to a "common read data channel" and a "multiplexer" 

having an "input channel" coupled to the "common read data channel" as recited in Claim 1 1 . 
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For these reasons, the Examiner has failed to establish that the proposed Nakanishi-Ferris 
combination discloses, teaches, or suggests all elements of Claim 11. As a result, the Examiner has 
not established a prima facie case of obviousness against Claim 1 1 (and its dependent claims). 
Accordingly, the Appellant respectfully requests that the final rej ection of Claims 11-14,1 6-20, and 
22 be withdrawn and that Claims 11-14, 16-20, and 22 be passed to allowance. 
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SUMMARY 



The Appellant has demonstrated that the present invention as claimed is clearly 
distinguishable over the prior art cited of record. Therefore, the Appellant respectfully requests that 
the Board of Patent Appeals and Interferences reverse the final rejection of the Examiner and instruct 
the Examiner to issue a notice of allowance of all claims. 

The Appellant has enclosed a check in the amount of $340.00 to cover the cost of this Appeal 
Brief. The Appellant does not believe that any additional fees are due. However, the Commissioner 
is hereby authorized to charge any additional fees (including any extension of time fees) or credit any 
overpayments to Davis Munck Deposit Account No. 50-0208. 



P.O. Drawer 800889 

Dallas, Texas 75380 

(972) 628-3600 (main number) 

(972) 628-3616 (fax) 

E-mail: wmunck@davismunckcom 



Respectfully submitted, 



Davis Munck, P.C. 





.William A. Munck 
Registration No. 39,308 
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APPENDIX A 
PENDING CLAIMS 

1 . A data processor comprising: 

an instruction execution pipeline comprising: 
a read stage; 
a write stage; and 

a first execution stage comprising E execution units capable of producing data results 
from data operands; 

a register file comprising a plurality of data registers, each of said data registers capable of 
being read by said read stage of said instruction pipeline via at least one of R read ports of said 
register file and each of said data registers capable of being written by said write stage of said 
instruction pipeline via at least one of W write ports of said register file; and 

bypass circuitry capable of receiving data results from output channels of source devices in at 
least one of said write stage and said first execution stage, said bypass circuitry comprising: 

a first plurality of bypass tristate line drivers having input channels coupled to first 
output channels of a first plurality of said source devices and tristate output channels coupled to a 
first common read data channel in said read stage; and 

a first multiplexer having a first input channel coupled to said first common read data 
channel and an output channel coupled to a first operand channel of a first execution unit in said first 
execution stage. 

2. The data processor as set forth in Claim 1 wherein said bypass circuitry further 
comprises a second plurality of bypass tristate line drivers having input channels coupled to said first 
output channels of said first plurality of said source devices and tristate output channels coupled to a 
second common read data channel in said read stage. 

3. The data processor as set forth in Claim 2 further comprising a first register file 
tristate line driver having an input channel coupled to a first one of said R read ports and an output 
channel coupled to said first common read data channel in said read stage. 

4. The data processor as set forth in Claim 3 further comprising a second register file 
tristate line driver having an input channel coupled to a second one of said R read ports and an output 
channel coupled to said second common read data channel in said read stage. 

5. (Cancelled). 

6. The data processor as set forth in Claim 4 further comprising a second multiplexer 
having a first input channel coupled to said second common read data channel and an output channel 
coupled to a second operand channel of said first execution unit in said first execution stage. 
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7. The data processor as set forth in Claim 6 wherein said bypass circuitry comprises a 
first bypass channel coupling an output channel of said first execution unit to a second input channel 
of said first multiplexer. 

8. The data processor as set forth in Claim 7 wherein said first bypass channel couples 
said output channel of said first execution unit to a second input channel of said second multiplexer. 

9. The data processor as set forth in Claim 8 wherein said bypass circuitry further 
comprises a second bypass channel coupling an output channel of a second execution unit in said 
first execution stage to a third input channel of said first multiplexer. 

10. The data processor as set forth in Claim 9 wherein said second bypass channel 
couples said output channel of said second execution unit to a third input channel of said second 
multiplexer. 

11. A processing system comprising: 

a data processor, wherein said data processor comprises: 
an instruction execution pipeline comprising: 
a read stage; 
a write stage; and 

a first execution stage comprising E execution units capable of producing data 
results from data operands; 

a register file comprising a plurality of data registers, each of said data registers 
capable of being read by said read stage of said instruction pipeline via at least one of R read ports of 
said register file and each of said data registers capable of being written by said write stage of said 
instruction pipeline via at least one of W write ports of said register file; and 

bypass circuitry capable of receiving data results from output channels of source 
devices in at least one of said write stage and said first execution stage, said bypass circuitry 
comprising: 

a first plurality of bypass tristate line drivers having input channels coupled to 
first output channels of a first plurality of said source devices and tristate output channels coupled to 
a first common read data channel in said read stage; and 

a first multiplexer having a first input channel coupled to said first common 
read data channel and an output channel coupled to a first operand channel of a first execution unit in 
said first execution stage; 

a memory coupled to said data processor; and 

a plurality of memory-mapped peripheral circuits coupled to said data processor for 
performing selected functions in association with said data processor. 
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12. The processing system as set forth in Claim 1 1 wherein said bypass circuitry further 
comprises a second plurality of bypass tristate line drivers having input channels coupled to said first 
output channels of said first plurality of said source devices and tristate output channels coupled to a 
second common read data channel in said read stage. 

1 3 . The processing system as set forth in Claim 1 2 further comprising a first register file 
tristate line driver having an input channel coupled to a first one of said R read ports and an output 
channel coupled to said first common read data channel in said read stage. 

14. The processing system as set forth in Claim 13 further comprising a second register 
file tristate line driver having an input channel coupled to a second one of said R read ports and an 
output channel coupled to said second common read data channel in said read stage. 

15. (Cancelled). 

16. The processing system as set forth in Claim 14 further comprising a second 
multiplexer having a first input channel coupled to said second common read data channel and an 
output channel coupled to a second operand channel of said first execution unit in said first execution 
stage. 

17. The processing system as set forth in Claim 16 wherein said bypass circuitry 
comprises a first bypass channel coupling an output channel of said first execution unit to a second 
input channel of said first multiplexer. 

18. The processing system as set forth in Claim 17 wherein said first bypass channel 
couples said output channel of said first execution unit to a second input channel of said second 
multiplexer. 

19. The processing system as set forth in Claim 1 8 wherein said bypass circuitry further 
comprises a second bypass channel coupling an output channel of a second execution unit in said 
first execution stage to a third input channel of said first multiplexer. 

20. The processing system as set forth in Claim 19 wherein said second bypass channel 
couples said output channel of said second execution unit to a third input channel of said second 
multiplexer. 

21. The data processor of Claim 1, further comprising a latch coupled to the output 
channel of the first multiplexer and to the first operand channel of the first execution unit. 

22. The processing system of Claim 1 1 , further comprising a latch coupled to the output 
channel of the first multiplexer and to the first operand channel of the first execution unit. 
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[57] ABSTRACT 

A bypass control circuit uses a plurality of entries corre- 
sponding to a plurality of addresses of a register file to grasp 
in which one of eight result buffers a processing result of an 
instruction having a destination address corresponding to 
any of the plurality of entries exists. When a source address 
of data to be required by a latch circuit matches with a 
destination address of a processing result of an instruction 
held in any of the eight result buffers, the processing result 
of the instruction having the matching destination address is 
transferred from a result buffer holding the processing result 
of the instruction to the latch circuit. Thus, fast bypass 
control can be achieved. 

17 Claims, 22 Drawing Sheets 
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PARALLEL PROCESSOR PERFORMING 

BYPASS CONTROL BY GRASPING 
PORTIONS IN WHICH INSTRUCTIONS 
EXIST 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to a parallel processor. It 
particularly relates to a parallel processor Capable of fast 
bypass control. 

2. Description of the Background Art 

A parallel processor provided with a plurality of pipelines 
is provided to improve processor performance. Pipeline 
processing and hazard will now be described with respect to 
a conventional scalar processor provided with one pipeline. 

Pipeline processing will be described first. Pipeline pro- 
cessing is a technique in which a plurality of instructions 
overlap for simultaneous execution. Currently, pipeline pro- 
cessing is a basic technique for obtaining fast CPUs (Central 
Processing Unit). In pipeline processing, one step of the 
pipeline is responsible for a portion of an instruction and 
executes it. The processing process for one instruction is 
divided into a plurality of smaller processing units. The 
smaller processing unit is referred to as a pipeline stage 
(referred to as "a stage" hereinafter). The stages are con- 
nected in order, to form one pipe. 

Throughput of pipeline processing depends on the speed 
at which an instruction exits the pipeline. Since the stages 
are joined, all of the stages must complete their processings 
simultaneously. The time required for processing in one 
stage is referred to as "a machine cycle". The machine cycle 
is determined by the processing time of the stage with the 
slowest processing speed. 

Hazard will now be described. In pipeline processing, 
there are such situations that instructions cannot be executed 
in an appropriate machine cycle. Such situations are called 
hazards. Hazards cause pipeline stalls. Generation of pipe- 
line stalls trigger degradation in processor performance. 
Data hazard, which is one of the hazards, will now be 
described. 

In pipeline processing, since executions of instructions 
overlap, relative execution timings of the instructions will be 
changed. This causes a hazard called data hazard. Data 
hazard is caused when orders of access to an operand are 
different between consecutive execution and pipelined 
execution of an instruction. Consider one example in which 
the execution process of an instruction has the five steps of 
(1) instruction fetch stage IF, (2) instruction decoding stage 
ID, (3) execution stage EX, (4) memory access stage MEM 
and (5) write back stage WB. It is also assumed that a new 
instruction is fetched per clock cycle. Furthermore, as for the 
instruction, assume an arithmetic instruction. In the first, 
instruction fetch stage IF, the instruction is fetched from an 
instruction cache (not shown) to an instruction decoder (not 
shown). In the second, instruction decoding stage ID, the 
fetched instruction is decoded by the instruction decoder, 
and according to the decoded instruction, an operand is 
fetched from a resistor file (not shown). In the third, execu- 
tion stage EX, the instruction is executed and the operation 
is performed on the operand. The operation result is main- 
tained in a result buffer (not shown) in the execution stage 
EX. In the fourth, memory access stage MEM, the operation 
result maintained in the result buffer in the execution stage 
EX is maintained in a result buffer (not shown) in the 
memory access stage MEM. In the fifth, write back stage 
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WB, the operation result maintained in the result buffer in 
the memory access stage MEM is written into the register 
file. Consider the case in which two arithmetic instructions 
(an addition instruction ADD and an subtraction instruction 
SUB) are pipelined. 

ADD al, a2, a3 

SUB a4, al, a5 

In each of the addition instructions ADDs and the sub- 
traction instruction SUB, the left indicates a destination 
address (an address of the register file for storing an opera- 
tion result), and the center and the right indicate source 
addresses (addresses of the register file for storing 
operands). The destination address al of the addition 
instruction ADD is a source address al of the subtraction 
instruction SUB. In such a case, data hazard is caused. 

FIG. 26 is a figure for explaining data hazard. Referring 
to FIG. 26, the addition instruction ADD exists in the 
instruction fetch stage IF at the first clock, in the instruction 
decoding stage ID at the second clock, in the execution stage 
EX at the third clock, in the memory access stage MEM at 
the fourth clock, and in the write back stage WB at the fifth 
clock. The subtraction instruction SUB exists in the instruc- 
tion fetch stage IF at the second clock, in the instruction 
decoding stage ID at the third clock, in the execution stage 
EX at the fourth clock, in the memory access stage MEM at 
the fifth clock, and in the write back stage WB at the sixth 
clock. The addition instruction ADD writes its operation 
result into the register file in the write back stage WB (at the 
fifth clock) according to the destination address al (arrow 
A). Meanwhile, the subtraction instruction SUB fetches an 
operand from the register file in the instruction decoding 
stage ID (at the third clock) according to the source address 
al (arrow B). While the substraction instruction SUB uses 
the operation result of the addition instruction ADD, the 
operation result of the addition instruction ADD has not 
been written in the register file yet in the instruction decod- 
ing stage ID (at the third clock) at which the subtraction 
instruction SUB fetches the operand from the register file. 
Such a condition is called data hazard. If the data hazard is 
not avoided, the subtraction instruction SUB will fetch and 
use an inappropriate operand. That is, the subtraction 
instruction SUB will read out data from the address al of the 
register file before the operation result of the addition 
instruction ADD is written into the register file, so that the 
subtraction instruction SUB may be inappropriately pro- 
cessed. 

Data hazard can be solved by a simple hardware technol- 
ogy called bypass. First, consider a processor including two 
latch circuits and an ALU (Arithmetic and Logic Unit) 
operating on two operands maintained in the two latch 
circuits. Furthermore, an operation result of the ALU is 
adapted to be fed back to the two latch circuit. Furthermore, 
when the operation result of the ALU is equal to an operand 
for another operation to be carried out in the ALU, not an 
operand read from the register file but the operation result in 
the ALU is adapted to be used as an input for another 
operation to be carried out in the ALU. Such a scheme is 
called a bypass scheme. A bypass scheme which solves data 
hazard is disclosed in "Computer Architecture: A Quantita- 
tive Approach," David A. Patterson, John L. Hennessy, 
MORGAN KAUFMANN PUBLISHERS, Inc., for 
example. 

FIG. 27 illustrates a bypass scheme. Referring to FIG. 27, 
the top shows pipeline processing for an addition instruction 
ADD having a destination address al and source addresses 
a2 and a3. The middle shows pipeline processing for a 
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subtraction instruction SUB having a destination address a4 
and source addresses al and a5. The bottom shows pipeline 
processing for an addition instruction ADD having a desti- 
nation address a6 and source addresses al and a7. In a 
five-stage pipeline configuration shown in FIG. 27, an 
operation result of the instruction (ADD al, a2, a3) need be 
bypassed not only to the next instruction (SUB a4, al, a5) to 
be input but also to the second next instruction (ADD a6, al, 
a7) to be input. In the second half of the instruction decoding 
stage ID, an operand is fetched (i.e., data is read out) (R), 
and the operation result is written in the first half of the write 
back stage WB. Thus, the operation result of the instruction 
(ADD al, a2, a3) need not be bypassed to an instruction 
input after the instruction (ADD a6, al, a7), since the 
operation result of the instruction (ADD al, a2, a3) has been 
written in the register file when the instruction input after the 
instruction (ADD a6, al, a7) moves to the execution stage 
EX. 

FIG. 28 is a schematic block diagram showing a conven- 
tional scalar processor having the bypass scheme. According 
to FIG. 28, the conventional scalar processor includes a 
register file 5, multiplexers 212 and 213, an ALU 215, result 
buffers 217 and 219, two latch circuits (not shown), and four 
comparators (not shown). The pipeline configuration is the 
above mentioned five-stage configuration. 

Two result buffers 217 and 219 are provided for holding 
operation results of instructions until the instructions move 
to the write back stage WB. When an instruction which uses 
as an operand an operation result of a preceding instruction 
(i.e., an instruction executed earlier at ALU 215 by one or 
two instructions) enters the execution stage EX, the preced- 
ing instruction which generated the operation result serving 
as the operand has moved from the execution stage EX to the 
memory access stage MEM (in the case of the instruction 
executed at ALU 215 one instruction), or from the memory 
access stage MEM to the write back stage WB (in the case 
of the instruction executed earlier at ALU 215 by two 
instructions) (see FIG. 27). 

Two operation results held in two result buffers 217 and 
219 can serve as either of inputs for two ports of ALU 215 
via two multiplexers 211 and 213. Multiplexers 211 and 213 
are controlled by determining whether any of source 
addresses of an instruction to be moved to the execution 
stage EX is the same as any of the destination addresses of 
two preceding instructions. If any of the source addresses of 
the instruction to be moved to the execution stage EX is the 
same as any of the destination addresses of the preceding 
instructions, the multiplexers 211 and 213 are controlled 
such that an operand is input not from register file 5 but from 
a result buffer (result buffer 217 or 219) in which an 
operation result of the instruction having the destination 
address exists. 

When source addresses of an instruction to be moved to 
the execution stage EX are the same as both of two desti- 
nation addresses of two preceding instructions, multiplexers 
211 and 213 are controlled such that an operation result of 
the instruction executed earlier at ALU 215 by one instruc- 
tion is input as an operand from result buffer 217. The 
inputting of an operand from a result buffer in which an 
operation result of the latest instruction of two preceding 
instructions exists when source addresses are the same as 
both of two destination addresses of the two preceding 
instructions, is referred to as priority selection. 

Comparison of source addresses of an instruction to be 
moved to the execution stage EX with two destination 
addresses of two preceding instructions for controlling mul- 
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tiplexers 211 and 213 is carried out by two comparators. As 
two latch circuits and four comparators are provided, as 
described above, one latch circuit and two comparators are 
used for one source address since there are two source 

5 addresses. Since ALU 215 completes operation in one stage, 
the pipeline is not stalled for any combination of instructions 
as long as bypassing is provided. 

An conventional parallel processor will now be described. 
As described above, the parallel processor is provided with 

10 a plurality of pipelines and starts executions of a plurality of 
instructions per clock cycle. For example, a plurality of 
operations are performed based on a plurality of arithmetic 
instructions in one clock cycle. Starting of execution of an 
instruction is also referred as issuing of an instruction. The 
VLIW (Very Long Instruction Word) processor is provided 

15 as a parallel processor. In the VLIW processor, a plurality of 
operations which is operable in parallel are assigned within 
one instruction. That is, a plurality of operations correspond- 
ing a plurality of scalar instructions are assigned to one 
instruction in the VLIW processor. Thus, a smaller number 

20 of the instructions for execution of one program is required 
in the VLIW processor than in a scalar processor. In the 
VLIW processor, the plurality of operations assigned to one 
instruction are executed using a plurality of independent 
functional units. The functional units include ALUs. In the 

25 VLIW processor, an instruction including the plurality of 
operations is referred to as "a basic instruction" and opera- 
tions included in one basic instruction are simply referred as 
"instructions" hereinafter. 

In the VLIW processor also, each functional unit is 

30 pipelined. This causes data hazard. For this reason, the 
bypass scheme is provided. In the VLIW processor, a large 
number of instructions are executed in parallel. Thus, its 
bypass control which determines which operation result 
should be bypassed is complicated. Consider an example in 

35 which the pipeline configuration of each functional unit is 
the above mentioned five-stage configuration. In this 
example, with a scaler processor, four comparators need be 
provided for comparing two destination addresses of two 
instructions existing in the execution stage EX and the 

40 memory access stage MEM with two source addresses of 
one instruction existing in the instruction decoding stage ID. 
Meanwhile, with the VLIW processor issuing four 
instructions, sixty-four comparators need be provided as it is 
provided with for functional units. That is, eight compara- 

45 tors need be provided for one source address. 

Furthermore, the priority selection must be performed in 
order to bypass an operation result of the latest instruction. 
With the scalar processor, the priority selection may be 
performed among two operation results. However, with the 

so VLIW processor issuing four instructions, which is provided 
with four functional units, the priority section must be 
performed among eight operation results. 

Thus, in the conventional VLIW processor, as the number 
of instructions which can be issued in parallel increases, the 

55 number of the comparators required and hence the number 
of objects to which the priority selection is applied increase. 
Thus, as the number of instructions which can be issued in 
parallel increases, processing complexity increases expo- 
nentially. As processing is complicated, the time required for 

60 the processing increases. The machine cycle is determined 
depending on the processing time of a stage with the slowest 
processing speed. Thus, bypass control becomes time- 
consuming and hence the time required for processing at one 
stage is increased so that the machine cycle increases. The 

65 increase in the machine cycle directly affects processor 
performance, leading to degradation in processor perfor- 
mance. 
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Thus, in the VLIW processor as a conventional parallel 
processor, as the number of instructions which can be issued 
in parallel increases, processing is complicated and bypass 
control becomes time-consuming, resulting in degradation 
in performance. 

SUMMARY OF THE INVENTION 

The present invention is made to solve the problem 
described above and it contemplates a parallel processor 
capable of fast bypass control. 

A parallel processor according to a first aspect of the 
present invention has a register file for storing therein a 
processing result of an instruction according to a destination 
address of the instruction. The parallel processor also pro- 
cesses a plurality of instructions included in one basic 
instruction in parallel. The parallel processor is provided 
with a plurality of functional units, a bypass circuit and a 
bypass control circuit. Each functional unit processes a 
corresponding instruction. Each functional unit also has a 
plurality of processing stages which pipelines the 
corresponding, successively input instructions. The bypass 
circuit is provided for selectively providing a plurality of 
processing results existing in the plurality of processing 
stages in the plurality of functional units to a plurality of the 
initial processing stages in the plurality of functional units. 
The bypass control circuit grasps in which processing stage 
of which functional unit an instruction having a destination 
address corresponding to an entry exists using a plurality of 
entries corresponding to a plurality of addresses of the 
register file. When a destination address of an instruction 
existing in any of the plurality of processing stages in the 
plurality of functional units matches with a source address of 
an instruction to be processed in the initial processing stage 
in a functional unit, bypass control circuit controls the 
bypass circuit such that a processing result of the instruction 
having the matching destination address is supplied from the 
processing stage in which the instruction having the match- 
ing destination address exists to the initial processing stage 
in which the instruction having the matching source address 
is to be processed. Furthermore, when an instruction having 
a destination address is grasped and a new instruction having 
the same destination address as the destination address is 
grasped input to any of the plurality of functional units, the 
bypass control circuit grasps the new, input instruction by an 
entry corresponding to the destination address. 

Thus, a parallel processor according to the first aspect of 
the present invention grasps in which processing stage of 
which functional unit an instruction having a destination 
address corresponding to an entry exists by a plurality of 
entries corresponding to a plurality of addresses of a register 
file. Furthermore, when an instruction having a destination 
address is grasped and a new instruction having the same 
destination address as the destination address is input to any 
of the plurality of functional units, the new, input instruction 
is grasped by an entry corresponding to the destination 
address. This dispenses with a comparator for comparing 
addresses as well as the priority selection. Consequently, 
circuitry in the parallel processor according to the first 
aspect of the present invention is simplified, allowing fast 
bypass control. 

A parallel processor according to a second aspect of the 
present invention has a register file for storing therein a 
processing result of an instruction according to a destination 
address of the instruction. The parallel processor processes 
a plurality of instructions included in one basic instruction in 
parallel. The parallel processor includes a plurality of func- 
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tional units, a bypass circuit and a bypass control circuit. 
Each functional unit processes a corresponding instruction. 
Each functional unit also has a plurality of processing stages 
in which the corresponding, successively input instructions 

5 are pipelined. The bypass circuit is provided for selectively 
supplying a plurality of processing results existing in the 
plurality of processing stages in the plurality of functional 
units to a plurality of initial processing stages in the plurality 
of functional units. The bypass control circuit grasps in 

30 which functional unit an instruction having a destination 
address corresponding to an entry exists by a plurality of 
entries corresponding to a plurality of addresses of the 
register file. When a destination address of an instruction 
existing in any of the plurality of processing stages in the 

15 plurality of functional units matches with a source address of 
an instruction to be processed at initial processing stage of 
a functional unit, the bypass control circuit controls the 
bypass circuit such that a processing result of the instruction 
having the matching destination address from the processing 

20 stage in which the instruction having the matching destina- 
tion address exists is supplied to the initial processing stage 
in which the instruction having the matching source address 
is to be processed. Furthermore, when an instruction having 
a destination address is grasped and a new instruction having 

25 the same destination address as the destination address is 
input to any of the plurality of functional units, the bypass 
control circuit grasps the new, input instruction by an entry 
corresponding to the destination address. 

Thus, a parallel processor according to the second aspect 

30 of the present invention grasps in which functional unit an 
instruction having a destination address corresponding to an 
entry exists using a plurality of entries corresponding to a 
plurality of addresses of a register file. When an instruction 
having a destination address is grasped and a new instruction 

35 having the same destination address as the destination 
address is input to any of the plurality of functional units, the 
new, input instruction is grasped by an entry corresponding 
to the destination address. This reduces the number of 
comparators for comparing addresses as compared with 

40 conventional parallel processors. This also reduces the fre- 
quency of comparison for the priority selection as compared 
with conventional parallel processors. Consequently, fast 
bypass control can be achieved in the parallel processor 
according to the second aspect of the present invention, 

45 A parallel processor according to a third aspect of the 
present invention has a register file for storing therein a 
processing result of an instruction according to a destination 
address of the instruction. The parallel processor processes 
a plurality of instructions included in one basic instruction in 

50 parallel. The parallel processor includes a plurality of func- 
tional units, a bypass circuit and a bypass control circuit. 
Each functional unit processes a corresponding instruction. 
Each functional unit also has a plurality of processing stages 
in which the corresponding, successively input instructions 

55 are pipelined. The bypass circuit is provided for selectively 
supplying a plurality of processing results existing in the 
plurality of processing stages in the plurality of functional 
units to a plurality of initial processing stages in the plurality 
of functional units. The bypass control circuit grasps in 

60 which processing stage an instruction having a destination 
address corresponding to an entry exists using a plurality of 
entries corresponding to a plurality of addresses of the 
register file. When a destination address of an instruction 
existing in any of the plurality of processing stages in the 

65 plurality of functional units matches with an source address 
of an instruction to be processed at the initial stage of a 
functional unit, the bypass control circuit controls the bypass 
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circuit such that an processing result of the instruction FIG. 13 is a circuit diagram showing the detail of control 

having the matching destination address from the processing circuit Sl-1 (FIG. 7) when a characteristic portion of a 

stage in which the instruction having the matching destina- VLIW processor according to the second embodiment of the 

tion address exists is supplied to the initial processing stage present invention is combined with that of a VLIW proces- 

in which the instruction having the matching source address 5 sor according to the third embodiment of the present inven- 

exists. Furthermore, when an instruction having a destina- tion. 

tion address is grasped and a new instruction having the FIG. 14 is a schematic block diagram showing a bypass 

same destination as the destination address is input to any of control circuit (FIG. 3) of a VUW processor according to a 

the plurality of functional units, the bypass control circuit f ourm embodiment of the present invention, 

grasps the new, input instruction by a entry corresponding to 10 mQ 15 ^ a ^tomMc block diagram showing the control 

the destination address. signal gener ating circuit of FIG. 14. 

Tims, a parallel processor according to the third aspect of mQ u fc & drcuit d{ showin ^ ^ Qf ^ 

the present invention grasps in which processing stage an control circuil sj.j 0 f pj G 15 

instruction having a destination address corresponding to an ^ ^ . ....*' - 

entry exists by a plurality of entries corresponding to a u . HG - a ™ l d ' a S ram sno^S. 11 * detai1 °} c ^}, 

plurality of addresses of the register file. When an instruc- circuit Sl-1 (FIG- 15) when characteristic portion of a VLIW 

tion having a destination address is grasped and a new processor according ,to the third embodiment of the present 

instruction having the same destination address as the des- mention is combined with that of a VLIW processor 

tination address is input to any of a plurality of functional according to the fourth embodiment of the present invention, 

units, the new, input instruction is grasped by an entry 20 FIG. 18 is a schematic block diagram showing a portion 

corresponding to the destination address. Thus, the number of a VLIW processor according to a fifth embodiment of the 

of comparators for comparing addresses is reduced as com- present invention. 

pared with conventional parallel processors. Accordingly, FIG. 19 is a schematic block diagram showing the bypass 

the frequency of comparison for the priority selection is control circuit of FIG. 18. 

reduced as compared with conventional parallel processors. 25 pjQ 20 is a schematic block diagram showing the conrol 

Consequently, fast bypass control is achieved in the parallel s j gna ] generating circuit of FIG 19 

processor according to the third aspect of the present inven- pjo. 21 is a circuit diagram showing the detail of the 

U0IL control circuit Sl-1 of FIG. 20. 

Tne foregoing and other objects, features aspects and ^ ^ sfa 

advantages of be present invent™ , will become more ^ ^ a chan £ eristic ^ of a 

apparent from the following detailed description of he ^ ^ r ^ embod f m ent of , he 

present invention when taken m conjunction with the resent mvention is combined witQ that of a VL IW proces- 

accompanying drawing*. m according t0 tne ^ embodiment 0 f the present inven- 

BRIEF DESCRIPTION OF THE DRAWINGS 35 «ion. 

„,„ , . . .. , ,. . FIG. 23 is a schematic block diagram showing the bypass 

FIG. 1 is a schematic block diagram showing a VLIW . . . .„,_ ,„. . ,„ » 0 J r . 

processor according to a first embodiment of the present contr ° 1 ««uu.(FIG. 18 of a VLIW processor according to 

• a sixth embodiment of the present invention, 

invention. r 

FIG. 2 shows a form of a basic instruction decoded at the M . FI °' 24 15 «.**°n»nc blcok diagram showing the control 

instruction decoding stage ID of the VLIW processor shown ° S1 8 nal g eneralln g «reurt of FIG. 23. 

in FIG. 1. FIG. 25 is a circuit diagram showing the detail of the 

FIG. 3 is a schematic block diagram showing a portion of contro1 circuit S11 of FIG - 24 

the VLIW processor of FIG. 1. FIG. 26 illustrates data hazard in a conventional scalar 

FIG. 4 shows correspondence of tristate buffers (FIG. 3) 45 Pr 0065501 "- 

to control signals which controls the tristate buffers. FIG. 27 illustrates a bypass scheme in a conventional 

FIG. 5 is a schematic block diagram showing the bypass scalar processor, 

grasping circuit of FIG. 3. FIG. 28 is a schematic block diagram showing a conven- 

FIG. 6 is a schematic block diagram showing an instruc- tional processor having the bypass scheme, 
tion grasping circuit of FIG. 5. 50 

n% n ■ u w u * <u ♦ 1 DESCRIPTION OF THE PREFERRED 

FIG. 7 is a schematic block diagram showing the control _^ , T ^^_ XTi ,„ XT ™ 

, .••■*£ no T EMBODIMENTS 
signal generating circuit of FIG. 5. 

FIG. 8 is a circuit diagram showing the detail of control A VLIW processor as a parallel processor according to the 

circuit Sl-1 of FIG. 7. present invention will now be described with reference to 

FIG. 9 is a schematic block diagram showing a portion of the figures. As described above, the VLIW processor pro- 

a stage field (FIG. 5) and a stage field control circuit cesses a plurality of instruction included in one basic 

controlling a portion of the stage field. instruction in parallel. The signals PIPE [0], PIPE [1], PIPE 

FIG. 10 is a circuit diagram showing the detail of control PI PIPE PI- STAGE [0], STAGE [1] and STAGE [2] 

circuit Sl-1 (FIG. 7) of a VUW processor according to a 60 correspond to the PIPE (1), PIPE (2), PIPE (3), STAGE (0), 

second embodiment of the present invention. STAGE (1) and STAGE (2) shown in the figures, respec- 

FIG. 11 is a circuit diagram showing the detail of control tively. 

C ? C f S l'\^ G - 7 ) f 3 VUW processor accordin S to a [FI RST EMBODIMENT] 

third embodiment of the present mvention. L J 

FIG. 12 is a schematic block diagram showing a portion 65 FIG. 1 is a schematic block diagram showing a VLIW 

of a stage field (FIG. 5) and a bit shifter controlling abortion processor according to a first embodiment of the present 

of the stage field. invention. Referring to FIG. 1, the VLIW processor accord- 
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ing to the first embodiment of the present invention includes In the instruction decoding stage ID register file 5 is 

an instruction cache 1, a decoder 3, a register file 5, accessed in order to obtain operands for their respective 

functional units 7-1, 7-2, 7-3, 7-4, a data cache 9, and a instructions. The obtained operands (i.e., data within register 

bypass control circuit 13. It is assumed that functional units file 5) are supplied to their respective functional units 7-1 to 

7-1 to 7-4 can execute any instruction. Eight data (i.e., eight 5 7.4 { n the execution stage EX. 

operands) corresponding to source addresses of four instruc- FIG 3 ^ a schematic block diagram showing a portion of 

tions can be read out from register file 5 at one time. tfae processor of FIG x similar portions thereof to 

Furthermore, processing results of four instructions can be ^ shown ^ nG ± m kbeled b ^ same refefence 

nATi nmt0 K reglS !f r characters and the descriptions thereof are, where 

DATA can be written into data cache 9 at one time according , r 

to four addresses ADDRs. Furthermore, four data DATA can 10 appropriate, not repeated. 

be read out from data cache 9 at one time according to four Referring to FIG. 3, a portion of the VLIW processor of 

addresses ADDRs. FIG- 1 includes a register file 5, a bypass control circuit 13, 

The pipeline processing will now be described. The latch circuits L1-L8, ALUs al-a4, result buffers el-e4, 

pipeline configuration is a five-stage configuration. It is ml-m4, tristate buffers T1-T72 and buses 1-1 to 4-2. Latch 

formed of the first, instruction fetch stage IF, the second, 15 circuits LI and L2, ALU al, and result buffers el and ml 

instruction decoding stage ID, the third, execution stage EX, form functional unit 7-1. Latch circuit L3 and L4, ALU a2, 

the fourth, memory access stage MEM, and the fifth, write and result buffers e2 and m2 form functional unit 7-2. Latch 

back stage WB. In the instruction fetch stage IF, a basic circuits L5 and L6, ALU a3, and result buffers e3 and m3 

instruction is fetched (i.e., read out) from instruction cache f orm functional unit 7-3. Latch circuits L7 and L8, ALU a4, 

1. In the instruction decoding stage ID, the basic instruction 20 an( j resu ] t buffers e4 and m4 form functional unit 7-4. 

is decoded by decoder 3, Four instructions included in the Latch cifcuil u coonects witn: result buffer el via bus V1 

decoded basic instruction are input to and processed in four and buffef T1; resuk buffer ml via bus M and 

functional units 7-1 to 7-4 ^ tristate buffer T9; result buffer e2 via bus 1-1 and tristate 

FIG 2 is a schematic diagram showing a form of a buffef T1? . ^ buffer ^ via bus ^ and buffef 

decoded basic instruction. Referring to FIG. 2 the basic 25 m resuU buffef e3 yia bus ^ aad lfistate buffef ^3; 

instruction is formed of: an instruction formed of an op code ^ ^ m3 ^ bus u ^ ^ ^ 

opl a destination address desl and source addresses srcl-1, buffef ^ ^ ^ ± ± ^ buffef J49 and ^ buflfer 

srcl-2 (i.e., an instruction to be input to functional unit 7-1); A . , j * • * ♦ u «? t*t 

v , - r - » m4 via bus 1-1 and Instate buffer T57. 

an instruction formed of an op code op2, a destination T ^ , . «. 

address des2 and source addresses src2-l, src2-2 (i.e., an 30 la, „ ch ««u* " connects with result buffers el, 

instruction to be input to functional unit 7-2); an instruction * 2 > «?> ™« ™™™ Tsn^nH^ 

formed of an op code op3, a destination address des3 and buffers T2 > T1 °. T18 - ™> ™. T42 - and l 58 

source addresses src3-l, src3-2 (i.e., an instruction to be Similarly latch circuit 13 connects with result buffers 

input to functional unit 7-3); and an instruction formed of an e J^ 4 ^ l ~^ ™ ^ ^j 1 buffers ?! ^ J 1 ?' 

op code op4, a destination address des4 and source addresses 35 T27 ' T35 ' T43 ' ™ and T59 " Latch circuit L4 similarly 

src4-l, src4-2 (i.e., an instruction to be input to functional connects ™* « sul l b "^ ^tT™ 1 

unit 7-4). The op codes opl-op4 indicate types of operations. and tnstate buffers T4 > T 12 - ™> T28, T36, T44, T52 and 

„ , . . , , , ,. ,. T60. Latch circuit L5 similarly connects with result buffers 

Referring again to FIG. 1, processings at the execution A ■ 1. 11 j . • . . u » -r= ni 

cv , r w C «« 1 •» 1 1 en ,„ el-e4 and ml-m4 via bus 3-1 and tristate buffers T5, T13, 

stage EX, memory access stage MEM and write back stage . , 

„.° ... , ' , •.. ,• , , t ■ . -„ T21, TC9,T37,T45,T53 and T61. Latch circuit L6 similarly 

WB will now be described according to types of instruc- 40 ' ' ' ' , . . , , ' 

. . .., .. . . connects with result buffers el-e4 and ml-m4 via bus 3-2 

tions. When an instruction is an arithmetic instruction, the ...... «r -™ . t-i-i ™„ ™„ ™ . , 

arithmetic instruction is executed, that is, an operation is ai £ buffe ? ™' T 14 ' J^' ' 13 ?/ ll T46 ' 1 ™ and 

performed in the execution stage EX and its operation result ™ 4 Latcb L7 ^f.^ , wl h th ( ! esUl ^% S 

/■ * * | f v - « 11 • MO „ lt UllffD , el-e4 and ml-m4 via bus 4-1 and tristate buffers T7, T15, 

its processuig result i is held in a resuh buffer (not x23fX j 1>mT47fTO ^ TO . UtchcilCTilL85illlilar , y 

shown) of the execution stage EX. In the memory access 45 ' ' ' ' . ~ . A , ^ - . u . i 

\av*a n u i ^ ■ fi,- Lite— rt p connects with result buffers el-e4 and ml-m4 via bus 4-2 

stage MEM, the operation result held in the result buffer of j . • . » u «? to -caq a 

♦u *• * cv • u 1,1 « u,.fF fl , /„^ f „u rt „ 7 „\ and tristate buffers T8, T16, T24, T32, T40, T48, T56 and 

the execution stage EX is held in a result buffer (not shown) >>>>>> 

of the memory access stage MEM. In the write back stage 

WB, the operation result held in the result buffer of the Latch circuit L1 connects with register file 5 via bus 1-1 

memory access stage MEM is written into register file 5. 50 and tristate buffer T65. Latch circuit 12 connects with 

When an instruction is a memory access instruction, an renter file 5 via bus 1-2 and tristate buffer T66. Latch 

address is calculated and held in a result buffer of the circuil ^ connects with register file 5 via bus 2-1 and 

execution stage EX in the execution stage EX. In the *istate buffer T67. Latch circuit L4 connects with register 

memory access stage MEM, data cache 9 or register file 5 is ^ 5 via bus 2-2 and tristate buffer T68. Latch circuit L5 

accessed according to the address held in the result buffer of 55 connects with register file 5 via bus 3-1 and Instate buffer 

the execution stage EX. When a memory access instruction T69 - Latch circuit L6 connects with register file 5 via bus 3-2 

is a loading instruction, data is read out from data cache 9 and tnstate buffer ™- Latcn circult L7 connects with 

and held in a result buffer of the memory access stage MEM. file 5 via bus 4 "! and tnstate buffer L7L 

When a memory access instruction is a storing instruction, circuit L8 connects with register file 5 via bus 4-2 and 

data is read out from the register file 5 and held in a result 60 tri s tate buffer T72. 

buffer of the memory access stage MEM. In the write back Tristate buffers T1-T72 are controlled to be turned on/off 

stage WB, when a memory access instruction is a loading according to corresponding control signals el-1-1 to r-4-2 

instruction, the data held in the result buffer of the memory from bypass control circuit 13. 

access stage MEM is written into register file 5, and when When a tristate buffer is turned on by a corresponding 

a memory access instruction is a storing instruction, the data 65 control signal, data is transferred from a result buffer cor- 

held in the result buffer of the memory access storage MEM responding to the tristate buffer or from the register file to a 

is written into data cache 9. latch circuit corresponding to the Instate buffer. 



5,805,852 

11 12 

FIG, 4 shows correspondence between control signals those shown in FIG. 5 are labeled by the same reference 

el-1-1 to r-4-2 generated by bypass control circuit 13 for characters and the descriptions thereof are, where 

controlling tristate buffers T1-T72 and tristate buffers appropriate, not repeated. Referring to FIG. 6, the instruc- 

T1-T72. Referring to FIG. 4, T1-T72 indicate tristate buff- tion grasping circuit is divided into a plurality of entries 

ers T1-T72 shown in FIG. 2. Furthermore, in FIG. 4, "TSB" 5 fl-fo- Entries fl-fn are provided corresponding to addresses 

stands for "tristate buffer". El-1-1 to r-4-2 indicate control of &&&& file 5 of FIG. 3, and the number of the entries is 

signals for controlling tristate buffers T1-T72. In FIG. 4, a ^ ual to the . number of ^dresses of register file 5 of FIG 3^ 

tristate buffer and a control signal indicated in one box For ? «"™P onds t0 an add ' ess ,. \. of 

correspond to each other. For example, tristate buffer Tl is re Jf ter £ ?, 5 °J ™ F ^ ther T'?' s f mce * d ^™£™ 

a i u ♦ i ■ ^ address 1 indicates the address "1 of register file 5 of 

controlled to be turned on/off by control signal el-1-1. 10 FIQ 3 ^ destination address „ r correspoi f ds t0 the eatry 

Refernng agam to FIG. 3, bypass control will now be fl. Also, since a source address "1" indicates the address "1" 

simply described. Source addresses of data required for latch 0 f register file 5 of FIG. 3, the source address "1" corre- 

circuits LI, L2, L3, L4, L5, L6, L7 and L8 are srcl-1, srcl-2, sponds to the entry fl. 

src2-l, src2-2, src3-l, src3-2, src4-l and src4-2, respec- Referring again to FIG. 5, field control circuit 21-1 

tively. 15 receives a destination address desl of an instruction to be 

Bypass control circuit 13 grasps in which stage of which input to functional unit 7-1. Then, field control circuit 21-1 

functional unit an instruction grasped by bypass control generates a plurality of signals for updating data for an entry 

circuit 13 exists. When a destination address of an instruc- (see FIG. 6) corresponding to the received destination 

tion existing in any of eight stages in four functional units address desl. The plurality of signals are signals ADDRESS, 

7-1 to 7-4 matches with any of source addresses srcl-1 to 20 P Ip E SET, VALID SET and STAGE RESET. Field control 

src4-2 of instructions executed in the execution stage EX, circuits 21 " 2 > 21 ' 3 ^ 21 * 4 receive destination addresses 

bypass control circuit 13 transfers a processing result (i.e., <*eS2. des3 and des4 of instructions tobe input to functional 

an operation result) of the instruction having the matching umts 7 ; 2 >™ and 7 ~*> respectively. The operations of fie d 

destination address from a result buffer of a stage in which ™| ^^{ 2 [ ° 21 ~ 4 Similaf l ° that ° f * M 

the instruction having the matching destination address 25 . 1 . . , , . . . 

• * 4 I . . . 7 . Instruction grasping circuit 15 grasps in which stage of 

exists to a latch circuit corresponding to the matchmg source , . * 7 7T s^v^f io ^ u ^ S v ^ 

,j t . , * i • ■* -i i *, which functional unit an instruction having a destination 

address. That is, bypass control circuit 13 turns on a Instate , . t , # rr- t 

t „ . i i • • j * 4 address corresponding to an entry exists by the entry. That 

buffer between a latch circuit corresponding to a matching • • * *• • • •* -i* ■ u- u i. 

, , , u , « c * . i-i is, mstruction grasping circuit 15 grasps m which result 

source address and a result buffer of a stage in which an , c c c *• i ■ 1* /• 

- . . f, . t . ^ n buffer of which functional unit a processing result (i.e., an 

instruction havmg a matchmg destmation address exists by 1A c . . .. r , . , \ 

. 1 & to J operation result) of an instruction havmg a destination 

a con ro signa . address corresponding to an entry currently exists by the 

On the other hand, when a source address does not match entry Validity fidd 23 indicates whether data ^ pipe field 25 

with any of destmation addresses of instructions grasped by and stage field 27 are valid or pipe field 25 indicates 

bypass control circuit 13, bypass control circuit 13 transfers 35 k whfch fr^om! unit an instruction having a destination 

data from register file 5 to a latch circuit corresponding to address corresponding to an entry currently exists. That is, 

the source address according to the source address. That is, it indicates in which functional unit a processing result (i.e., 

bypass control circuit 13 turns on a tristate buffer connected afl operation result ) of an instruction having a destination 

between a latch circuit corresponding to a source address address corresponding to an entry exists. Stage field 27 

which does not match with any of destmation addresses of ^ indicates in which stage an instruct ion having a destination 

instructions grasped by bypass control circuit 13 and register address corresponding to an entry currently exists. That is, 

file 5 by a control signal. it indicates in which result buffer a processing result (i.e., an 

Referring to FIGS. 3 and 4, bypass control will be operation result) of an instruction having a destination 

specifically described. An operation result (i.e., data) of address corresponding to an entry exists. 

ALU a4 held in result buffer e4 is assumed to match with a 45 Validity field 23, pipe field 25 and stage field 27 are set or 

source address of data to be held in latch circuit LI. The reset according t0 the plurality of signals generated by field 

matchmg is detected by bypass control circuit 13 and bypass con trol circuits 21-1 to 21-4. An signal ADDRESS deter- 

control circuit 13 sets control signal e4-l-l to "1". Control mines an entry t0 be ^ or reset accor ding to a destination 

signal e4-l-l thus set to "1" turns on tristate buffer T49. address input to a field control circuit. That is, a signal 

Thus, the operation result (i.e., data) of ALU a4 held in result 5Q ADDRESS is provided for selecting an entry corresponding 

buffer e4 is transferred to latch circuit LI by bus 1-1. t0 a destination address input to a field control circuit. A 

In the VLIW processor, dissimilar to a scalar processor, signal VALID SET sets validity field 23 for an entry accord- 
eight operation results (i.e., eight data) held in eight result j n g to a signal ADDRESS. This indicates that data in pipe 
buffers el-e4 and ml-m4 of four functional units 7-1 to 7-4 field 25 and stage field 27 for the entry according to the 
can serve as any of inputs of four ALUs al-a4 of four 55 signal ADDRESS are valid. A signal PIPE SET sets pipe 
functional units 7-1 to 7-4. field 25 for an entry according to a signal ADDRESS, that 

FIG. 5 is a schematic block diagram showing bypass is, a signal PIPE SET sets pipe field 25 to indicate a 

control circuit 13 of FIG. 3. Referring to FIG. 5, the bypass functional unit to which an instruction having a destination 

control circuit includes field control circuits 21-1, 21-2, address input to a field control circuit is input. A signal 

21-3, 21-4, an instruction grasping circuit 15, an address 60 STAGE RESET resets stage field 27 for an entry according 

decoder 17, and a control signal generating circuit 19. to a signal ADDRESS. Furthermore, stage field 27 for an 

Instruction grasping circuit 15 is formed of a field indicating entry according to a signal ADDRESS is newly set when- 

validness/invalidness (referred to as "a validity field" ever an instruction having a destination corresponding to the 

hereinafter) 23, a functional unit field (referred to as "a pipe entry moves to another stage, which will be described in 

field" hereinafter) 25, and a stage field 27. 65 detail later. 

FIG. 6 is a schematic block diagram showing instruction Address decoder 17 receives eight source addresses 

grasping circuit 15 of FIG. 5. Similar portions thereof to srcl-1 to src4-2 of four instructions from decoder 3 of FIG. 
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1, and decodes them for transfer to instruction grasping control signals el-3-2 to e4-3-2, ml-3-2 to m4-3-2, and 

circuit 15. Instruction grasping circuit 15 transfers data in r-3-2. Control circuit S4-1 is used corresponding to source 

fields (i.e., validity fields 23, pipe fields 25 and stage fields address src4-l, and receives signals VALID, PIPE and 

27) for entries corresponding to the eight source addresses STAGE from fields 23, 24 and 27 for an entry corresponding 

srcl-1 to src4-2 transferred from address decoder 17 to 5 to source address src4-l. Then control circuit S4-1 generates 

control signal generating circuit 19. Since eight source control signals e 1-4-1 to e4-4-l, ml -4-1 to m4-4-l and r-4-1. 

addresses srcl-1 to src4-2 are input, control signal generat- Control circuit S4-2 is used corresponding to source address 

ing circuit 19 receives eight one-bit data (i.e., eight signals S rc4-2, and receives signals VALID, PIPE and STAGE from 

VALIDs), eight two-bit data (i.e., eight signals PIPEs) and fields 2 3, 24 and 27 for an entry corresponding to source 

eight two-bit data (i.e., eight signals STAGEs) from validity 10 address src 4_2. Then control circuit S4-2 generates control 

field 23, pipe field 25 and stage field 27, respectively. signals el ^_ 2 to e4-4-2, ml-4-2 to m4-4-2, and r4-2. 

Control signal generating circuit 19 generates control signals piG. 8 is a circuit diagram showing the detail of control 

Ti ™° i { TH \ ? niT t n H tnSa H circuit S 1 " 1 of FIG 7 - Referrin S t0 ™- 8 > Control circuit 

T1-T72 according to data input from the three fields 23, 25 ^ ^ drcuits 29 _ 51 and an N0R 

and 27 of instruction grasping circuit 15. 15 C i rcu ^ 53 fh e circles added to inputs of AND circuits 29-33 

FIG. 7 is a schematic block diagram showing control afld 3? _ 51 indicates that mverted sigQals m mput t0 the 

signal generating circuit 19 of FIG. 5. Referring to FIG. 7 ^ circuits A signal piPE [0] indicates the first bit of a 

the control signal generating circuit includes eight control sigQal pipEj and a sigflal pipE [x] indicates the 

circuits Sl-1, Sl-2, S2-1, S2-2, S3-1 S3-2, S4-1 and S4-2. secQnd bit of the tWQ _ bit gignal pipE A signal gfAGE [0] 

The eight control circuits Sl-1 to S4-2 are provided corre- 20 the ^ bit of a two . bit signal STAGE and a signal 

spending to eight source addresses srcl-1 to src4-2. For STAGE [1] indicates the second bit of the two-bit signal 

example, a control circuit Sl-1 is provided corresponding to STAGE 

a source address srcl-1. Control circuit Sl-1 will now be ' . v 1(1 . WATTr , ™ c rn1 

specifically described with reference to FIGS. 3, 4, 5 and 7. ^^T^ -35 K ^ es ^ ih VALID, HFE [0] 

Control circuit Sl-1 receives three signals from three fields 25 ™ d PI H PE ^ f^,™ 37-51 receives signals STAGE 

23, 25 and 27 for an entry corresponding to source address ^ at f STA ° E Wjf? C ™T?L 3 ' If* 6 , ™ 

i i tu * ■ ** * u-* • u/ATrn t „ m u;* output signal from AND circuit 29. AND circuits 41 and 43 

srcl-1. That is, it receives a one -bit signal VALID, a two-bit • . ir ax TT ^ • , ^ • 

i TiTnT' * . u-# • 1 cTAPr v „, iM** « ia receive an output signal irom AND circuit 31. AND circuits 

signal PIPE and a two-bit signal STAGE from validity field Am . Am v . & . ir . . 

23, pipe field 25 and stage field 27, respectively. The signal ? •*» 47 receive an output signal from AND circuit 33^ 

VALID indicates whether the signals PIPE and STAGE are 30 ^ ND . ™™ 51 receive an output signal from AND 

i-j 1-j • i mnr ■ j ■ . u- u a circuit 35. NOR circuit 53 receives output signals from AND 

valid or invalid. The signal PIPE indicates in which func- . v s> 

tional unit an instruction (i.e., a processing result of an circuits - . 

instruction) having the same destination address as source AND circuits 29-35 are provided for identifying the 

address srcl-1 exists. The signal STAGE indicates in which functional units. That is, AND circuits 29-35 are provided 

stage the instruction (the processing result of the instruction) 35 for identifying a functional unit in which an instruction (i.e., 

having the same destination address as source address srcl-1 a processing result of an instruction) grasped by using an 

exists. Control circuit Sl-1 generates control signals el -1-1 entry corresponding to source address srcl-1 exists. When 

to e4-l-l, ml-1-1 to m4-l-l, and rl-1-1 for controlling data m PiP e field 25 md sta S e field 27 for the entr y 

tristate buffers Tl, T9, T17, T25, T33, T41, T49, T57 and corresponding to source address srcl-1 are valid, that is, 

T65 connecting with bus 1-1, according to the signals 40 when signals PIPE [0], PIPE [1], STAGE [0] and STAGE [1] 

VALID PIPE and STAGE. are van °\ tb e signal VALID is "1". AND circuits 37-51 are 

The operations of control circuits Sl-2 to S4-2 are similar provided for identifying the stages. That is, AND circuits 

to that of control circuit Sl-1. That is, the control circuit 37 ~ 51 are Provided for identifying a stage in which the 

Sl-2 is used corresponding to source address srcl-2, and instruction (i.e., the processing result of the instruction) 

receives signals VALID, PIPE and STAGE from fields 23, 45 § ras P ed b ? usic S the entr y corresponding to the source 

25 and 27 for an entry corresponding to source address ad dress srcl-1 exists. 

srcl-2. Then, control circuit Sl-2 generates control signals Thus, a functional unit and a stage in which an instruction 

el-1-2 to e2-l-2, ml-1-2 to m4-l-2, and rl-1-2. Control having a destination address matching with source address 

circuit S2-1 is used corresponding to source address src2-l, srcl-1 exists are identified by AND circuits 29-51, That is, 

and receives signals VALID, PIPE and STAGE from fields 50 a functional unit and a result buffer therein in which a 

23. 24 and 27 for an entry corresponding to source address processing result of an instruction having a destination 
src2-l. Then, control circuit S2-1 generates control signals address matching with source address srcl-1 are identified. 
el-2-1 to e4-2-l, ml-2-1 to m4-2-l, and r-2-1. Control Then, control signals el-1-1 to e4-l-l, ml-1-1 to m4-l-l for 
circuit S2-2 is used corresponding to source address src2-2, turning on a tristate buffer connecting with bus 1-1 and 
and receives signals VALID, PIPE and STAGE from fields 55 corresponding to the identified result buffer are generated in 

23. 25 and 27 for an entry corresponding to source address order to transfer the processing result of the instruction held 
src2-2. Then, control circuit S2-2 generates control signals in the identified result buffer to latch circuit LI via bus 1-1. 
el-2-2 to e4-2-2, ml-2-2 to ra4-2-2, and r-2-2. Control A specific example will now be described with reference 
circuit 3-1 is used corresponding to source address src3-l, to FIGS. 3 and 8. It is assumed that signals PIPE "0", PIPE 
and receives signals VALID, PIPE and STAGE from fields 60 "1", PIPE "2" and PIPE "3" indicate functional units 7-1, 
23, 25 and 27 for an entry corresponding source address 7-2, 7-3 and 7-4, respectively. It is also assumed that signals 
src3-l. Then, control circuit S3-1 generates control signals STAGE "0", STAGE "1", STAGE "2" and STAGE "3" 
el-3-1 to e4-3-l, ml-3-1 to m4-3-l, and r-3-1. indicate instruction decoding stage ID, execution stage EX, 

Control circuit S3-2 is used corresponding to source memory access stage MEM and write back stage WB, 

address src3-2, and receives signals VALID, PIPE and 65 respectively. When the signals PIPE and STAGE are "0" and 

STAGE from fields 23, 25 and 27 for an entry corresponding "1", respectively, they indicate that an instruction (i.e., a 

to source address src3-2. Then control circuit S3-2 generates processing result of an instruction) having a destination 
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address matching with source address srcl-1 exists in result exists in the write back stage WB, that is, when a processing 

buffer el of the execution stage EX in functional unit 7-1. result of an instruction having a destination address corre- 

Therefore, tristate buffer Tl need be turned on to bypass a sponding to an entry exists in the write back stage WB, stage 

processing result of the instruction stored in result buffer el field 55 for the entry is "3". 

to latch circuit LI. Since the signal PIPE is "0", the signals 5 It is stage field control circuit 54 that thus sets (i.e., 

PIPE [0] and PIPE [1] are "0"s. Accordingly, when the updates) data in stage field 55. "1" is added to a value of 

signal VALID is "1", the output of only AND circuit 29 is stage field 55 by adder 57 per clock cycle, that is, when an 

set to "1". Furthermore, since the signal STAGE is "1", the instruction moves to another stage. Then, the added value is 

signals STAGE[0] and STAGE [1] are "0" and "1", respec- stored in stage field 55. Data comparator 59 compares the 

tively. Thus, the output of only AND circuit 37 is set to "1". 1Q value in stage field 55 with a value "3" stored in reference 

That is, only control signal el-1-1 is set to "1". The control circuit 61. When the value in stage field 55 is "3", data 

signal el-1-1 which has been set to "1" turns on tristate comparator 59 generates a signal VALID RESET. That is, 

buffer Tl. when an instruction moves to the write back stage WB, a 

When data in pipe field 25 and stage field 27 for the entry signal VALID RESET is output from data comparator 59 

corresponding to source address srcl-1 are invalid, that is, 35 and a validity field for an entry corresponding to stage field 

when signals PIPE [0] PIPE [1], STAGE [0] and STAGE [1] control circuit 54 is reset. The reset field indicates that the 

are invalid, the signal VALID is "0". The invalidness of data pipe field and stage field for the entry are invalid. The reason 

in pipe field 25 and stage field 27 for the entry corresponding why a signal VALID RESET is output when an instruction 

to source address srcl-1 means that a processing result of an exists in the write back stage WB is that bypass is not 

instruction having a destination address matching with 20 required and that data may be read out directly from register 

source address srcl-1 does not exist in any of the result file 5. 

buffers of any of the functional units. In such a case, Referring again to FIGS. 3, 5 and 6, setting or resetting of 

therefore, data corresponding to source address srcl-1 need instruction grasping circuit 15 will be specifically described, 

be read out from register file 5 to latch circuit LI. That is, When a destination address desl of an instruction to be input 

a control signal which turns on tristate buffer T65 need be 2 s to functional unit 7-1 is assumed to be "1", field control 

generated. Since the signal VALID is "0", output signals of circuit 21-1 sets an signal ADDRESS to "1". The signal 

AND circuits 29-35 are all set to "0"s, which in turn sets all ADDRESS "1" selects an entry fl corresponding to the 

of the output signals of AND circuits 37-51 to "0"s. destination address desl "1". That is, three fields 23, 25 and 

Accordingly, an output signal of only AND circuit 53 is set 27 for the selected entry fl are set or reset. Then, field 

to "1". That is, only control signal r-1-1 is set to "1" and 30 control circuit 21-1 generates a signal VALID SET for 

tristate buffer T65 turns on. The configurations of control setting validity field 23 for entry fl to "1". When validity 

circuits Sl-2 to S4-2 of FIG. 7 are similar to that of control field 23 is "1", it indicates that data in pipe field 25 and stage 

circuit Sl-1 shown in FIG. 8. field 27 are valid. When validity field 23 is "0**, data in pipe 

FIG. 9 is a schematic block diagram showing a portion of field 25 and stage field 27 are invalid, 

stage field 27 of FIG. 5 and a stage field control circuit for 35 Furthermore, field control circuit 21-1 generates a signal 

controlling a portion of stage field 27. Referring to FIG. 9, PIPE SET for setting pipe field 25 for entry fl to "0". Pipe 

a stage field 55 corresponds to one entry in stage field 27 of fields 25 "0", "1", "2" and "3" indicate functional units 7-1, 

FIG, 5. That is, stage field 55 is the stage field for one entry. 7-2, 7-3 and 7-4, respectively. Furthermore, field control 

Furthermore, a stage field control circuit 54 is provided circuit 21-1 generates a signal STAGE RESET for resetting 

corresponding to stage field 55. That is, instruction grasping 40 stage field 27 for entry fl. When the signal STAGE RESET 

circuit 15 (FIG. 6) is provided with a plurality of stage field is input to stage field 27, stage 27 is set to "0". Stage fields 

control circuits 54 corresponding to a plurality of entries 27 "0", "1", "2" and "3" indicate that an instruction exists in 

fl-fn. instruction decoding stage ID, execution stage EX, memory 

Stage field control circuit 54 includes an adder 57, a data access stage MEM and write back stage WB, respectively, 

comparator 59 and a reference circuit 61. 45 Referring now to FIGS. 3, 5, 7, 8 and 6, bypass control 

Stage field 55 for an entry is provided for indicating in performed by bypass control circuit 13 will be described 

which stage an instruction having a destination address more specifically. Source address srcl-1 input to address 

corresponding to the entry exists. That is, stage field 55 for decoder 17 is assumed to be "2". Address decoder 17 reads 

an entry is provided for indicating in which result buffer a out an entry £2 for instruction grasping circuit 15 corre- 

processing result of an instruction having a destination so sponding to source address srcl-1 "2", and transmits data of 

address corresponding to the entry exists. When stage field validity field 23, pipe field 25 and stage field 27 for entry £2 

55 is in an initial state or is reset by a signal STAGE RESET, corresponding to source address srcl-1 "2" to control signal 

stage field 55 is "0". That is, when an instruction having a generating circuit 19. That is, signals VALID, PIPE and 

destination address corresponding to an entry exists in the STAGE are input to control circuit Sl-1 from validity field 

instruction decoding stage ID, stage field 55 for the entry is 55 23, pipe field 25 and stage field 27, respectively. When the 

"0". When an instruction having a destination address cor- signals VALID, PIPE and STAGE are all "l"s, that is, when 

responding to an entry exists in the execution stage EX, that an instruction (i.e., a processing result of an instruction) 

is, when a processing result of an instruction having a having a destination address matching with a source address 

destination address corresponding to an entry exists in a srcl-1 exists in result buffer e2 in functional unit 7-2, the 

result buffer of the execution stage EX, stage field 55 of the 60 output of only AND circuit 31 (FIG. 8) is set to "1" since the 

entry is "1". When an instruction having a destination signals VALID, PIPE [0] and PIPE [1] are "1", "0" and "1", 

address corresponding to an entry exists in the memory respectively, and the output of only AND circuit 41 is set to 

access stage MEM, that is, when a processing result of an "1" since the signals STAGE [0] and STAGE [1] are "0" and 

instruction having a destination address corresponding to an "1", respectively. That is, only control signal e2-l-l is set to 

entry exists in a result buffer of the memory access stage 65 "1". Thus, tristate buffer T17 of FIG. 3 turns on and the 

MEM, stage field 55 for the entry is "2". When an instruc- processing result of the instruction held in result buffer e2 is 

tion having a destination address corresponding to an entry transferred to latch circuit Sl-1 via bus 1-1. 
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Referring to FIGS. 5 and 6, the reason why the VLIW A control signal generating circuit of the VLIW processor 

processor according to the first embodiment dispenses with according to the second embodiment is similar to that of the 

the priority selection (i.e., comparison). With instruction VLIW processor according to the first embodiment shown in 

grasping circuit 15 grasping an instruction by an entry, when FIG. 7, except for the following: as described above, since 

an instruction having a destination address corresponding to 5 each of functional units 7-1 to 7-4 is represented in pipe field 

the entry is input to any of functional units 7-1 to 7-4, that 25 (FIG. 6) using bit vector representation using four bits, a 

is, when a destination address corresponding to the entry is signal PIPE input from pipe field 25 to control circuits Sl-1 

newly input to any of field control circuits 21-1 to 21-4, the to S4-2 is a four-bit signal. Accordingly, the specific circuit 

newly input instruction is grasped by the entry. For example, configuration of control circuit Sl-1 to S4-2 of the VLIW 

when the destination address of the instruction currently 10 processor according to the second embodiment differs from 

grasped is "1" and the destination address of the next that of control circuits Sl-1 to S4-2 of the VLIW processor 

instruction to be input is also "1", overwriting is carried out according to the first embodiment shown in FIG. 8. 

at entry fl, since the entry corresponding to the destination FIG. 10 is a circuit diagram showing control circuit Sl-1 

address "1" is entry fl only. Overwriting is thus performed (FIG. 7) of the VLTW processor according to the second 

in instruction grasping circuit 15 so that the priority selec- is embodiment in detail. Similar portions thereof to those 

tion is not required. shown in FIG. 8 are labeled by similar reference characters 

In the VUW processor according to the first embodiment, and the description thereof is, where appropriate, not 

as described above, a bypass control circuit controls bypass repeated. Referring to FIG. 10, control circuit Sl-1 includes 

by grasping in which result buffer in which functional unit AND circuits 63-69 and 37-51 and an NOR circuit 53. AND 

a processing result of an instruction exists. Thus, a com- 20 circuits 63-69 receive a signal VALID. AND circuits 63, 65, 

parator for comparing a destination address with a source 67 and 69 receive signals PIPE [0], PIPE [1], PIPE [2] and 

address, and hence the priority selection are not required. PIPE [3], respectively. 

Consequently, in the VLIW processor according to the first Referring to FIG. 5 also, the signal PIPE [0] corresponds 

embodiment, the circuitry is simplified and fast bypass t0 tne g rst bit of the pipe field, that is, it indicates the first 

control can be achieved. 25 bit of a signal p| PE; the signal pipE j-jj corresponc is to the 

[SECOND EMBODIMENT] second bit of the pipe field, that is, it indicates the second bit 

^ ,rr™r hi of a signal PIPE: the signal PIPE [2] corresponds to the third 

The configuration of a VLIW processor as a parallel ^ ^ ^ ^ (hird ^ of a 

processor according to a second embodiment is similar to sj p , aQd gi j p , pE p] dg , he 

that of the VUW processor according to the first embodi- 30 fo * urth b; , of , he ; ^ tha , ^ it L in J dicates me fourth bi , 

ment shown in FIGS. 1 and 3. The form of a basic instruction p , pE n p , pE [Q] fa aQ 

decoded at the instruction decoding stage ID of the VLIW intu * doa (i a processing resu T t of an instruction) 

processor according to the second embodiment is similar to ejds(s ^ when ^ ^ , p]pE 

that shown ,a FIG. 2. Correspondent of rotate buffers * } * instruction (a processing result of an 

T1-T72 ; to control signals el-1-1 to r4-2 for controlling 35 \ D i lmction) d exists £ f ^ ctional * nil 7 . 2; wnen 

tristate buffers Tl-72 in the VLIW processor according to , r % n . „-„ ■ * ✓ i. c 

, , 1 « > * ■ ■ m t *u ♦ u ■ t?i<^ a signal PIPE [2] is "1, an instruction (a processing result of 

the second embodiment is similar to that shown in FIG. 4. * instructio L n) J ^ped exists in functional unit 7-3; and 

Abypass control circuit of the VLIW processor according to whea [4] . aQ mstnlction (a processiQg 

the second embodunent is similar to the bypass control K , . , 4 - \ _T • , - c *■ i •* n a 

, , _ . ,. , JV . . result of an instruction) grasped exists in functional unit 7-4. 

circuit according to the first embodiment shown in FIG. 4. 40 ™ c t ./. , • u - , t - A • t 

s Thus, a functional unit in which an instruction grasped exists 

An instruction grasping circuit of the VLIW processor can be identified by AND circuits 63-69. The specific 

according to the second embodiment is similar to that of the configuration of control circuits S M to S4-2 is the same as 

VLIW processor according to the first embodiment shown in ^ of CQntrol drcuit S1 . 1 shown in FIG 1Q 
FIG. 6, except for the following: in the instruction grasping 

circuit of the VLIW processor according to the first 45 a stage nem ana a stage neid control circuit 01 tne vli w 

embodiment, four functional units 7-1 to 7-4 are represented P rocessor ^rdrng to the second embodiment are sumlar -to 

in pipe field 25 using two bits. On the other hand, in the those according to the first embodiment shown in FIG. 9. 

VLIW processor according to the second embodiment, four In the VLIW processor according to the second 

functional units 7-1 to 7-4 are represented in pipe field 25 by embodiment, as described above, four functional units 7-1 to 

bit vector representation using four bits. For example, in the 50 7 * 4 ff 10 * 3) arc represented in pipe field 25 (FIG. 5) of the 

instruction grasping circuit of the VUW processor accord- bypass control circuit by bit vector representation using four 

ing to the second embodiment, when the first bit of pipe field bits. That is, a bit vector representation the number of bits of 

25 is "1", it indicates that an instruction grasped exists in which is equal to that of the functional units is used to 

functional unit 7-1, when the second bit of pipe field 25 is represent the functional units. Thus, the control circuit (FIG. 

"1", it indicates that an instruction grasped exists in tunc- 55 10 ) is more simplified than the control circuit (FIG. 8) of the 

tional unit 7-2, when the third bit of pipe field 25 is "1", it VLIW processor according to the first embodiment. Conse- 

indicates that an instruction grasped exists in functional unit quently faster bypass control can be achieved in the VLIW 

7-3, and when the fourth bit of pipe field 25 is "1", it processor according to the second embodiment than in that 

indicates that an instruction grasped exists in functional unit according to the first embodiment. 

7-4. Thus, in the VLIW processor according to the second 60 rxwiDn FiwmnniMPMTl 

embodiment, each functional unit is represented by bit [Ihiku tJViriuuiMiiiN i j 

vector representation the number of bits of which is equal to The configuration of a VLIW processor as a parallel 

the number of functional units 7-1 to 7-4. Since the bit vector processor according to a third embodiment is similar to that 

representation using four bits is used in pipe field 25 to according to the first embodiment shown in FIGS. 1 and 3. 

represent each functional unit, the number of bits of a signal 65 The form of a basic instruction decoded at the instruction 

PIPE input from pipe field 25 to control signal generating decoding stage ID of the VLIW processor according to the 

circuit 19 (FIG. 5) is also four. third embodiment is similar to that shown in FIG. 2. 
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Correspondence of Instate buffers Tl-72 of the VLIW results of instructions held in result buffers el-e4 in the 

processor according to the third embodiment to control execution stage EX and in result buffers ml-m4 in the 

signals e 1-1-1 to r-4-2 is similar to that shown in FIG. 4. A memory access stage MEM, signals STAGE [0] and STAGE 

bypass control circuit of the VLIW processor according to [3] corresponding to the first and fourth bits of stage field 27, 

the fourth embodiment is similar to that according to the first 5 respectively, need not be input to control circuit Sl-1. While 

embodiment shown in FIG. 5. AND circuits 37-51 shown in FIG. 8 (the first embodiment) 

An instruction grasping circuit of the VLIW processor for identifying the stages each have three input, AND 

according to the third embodiment is similar to that accord- circuits 71-85 shown in FIG. 11 for identifying the stages 

ing to the first embodiment shown in FIG. 6, except for the each have t™ 0 

following: in stage field 27 of the VLIW processor according 10 Referring to FIGS. 3 and 11, the operation will be 

to the first embodiment, a stage in which an instruction specifically described. When a source address of data 

grasped exists is represented by two bit. On the other hand, required by latch circuit LI is assumed to match with a 

in stage field 27 of the instruction grasping circuit of the destination address of an processing result (i.e., data) of an 

VLIW processor according to the third embodiment, a stage instruction existing in result buffer e3, the signals VALID 

in which an instruction grasped exists is represented by bit 15 and PIPE [0] are "1" and the signal PIPE [1] is "0". 

vector representation using four bits. For example, the first Accordingly, the output signal of only AND circuit 33 is set 

bit of stage field 27 is set to "1" when an instruction exists to "1". Furthermore, as the signals STAGE [1] and STAGE 

in the instruction decoding stage ID, the second bit thereof [2] are "1" and "0", respectively, the output signal of only 

is set to "1" when an instruction exists in the execution stage AND circuit 79 is set to "1". That is, only control signal 

EX, the third bit thereof is set to "1" when an instruction 20 e3-l-l is set to "1". Accordingly, tristate buffer T33 turns on 

exists in the memory access stage MEM, and the fourth bit and the processing result of the instruction held in result 

thereof is set to "1" when an instruction exists in the write buffer e3 is transferred to latch circuit LI. 

back stage WB. Since four stages are thus represented in when a source address of data to be input to latch circuit 

stage field 27 using the bit vector representation using four LI does not match with any of eight destination addresses of 

bits, a signal STAGE input from stage field 27 to control 25 process i n g resu it s of eight instructions held in eight result 

signal generating circuit 19 (FIG. 5) is also a four-bit signal. buffers el-e4 and ml-m4, the signal VALID is "0". 

A control signal generating circuit of the VLIW processor Therefore, output signals of AND circuits 71-85 are all set 

according to the third embodiment is similar to that accord- to "0"s. Accordingly, the output of NOR circuit 53 is set to 

ing to the first embodiment shown in FIG. 7, except for the "1". That is, control signal r-1-1 is set to "1". Accordingly, 

following: as described above, since in the VLIW processor 30 tristate buffer T65 turns on, and data from the register file 5 

according to the third embodiment, four stages are repre- is input to latch circuit LI. The circuit configuration of 

sented in stage field 27 by bit vector representation using control circuits Sl-2 to S4-2 is similar to that of the control 

four bits, a signal STAGE input from stage field 27 to control circuit shown in FIG. 11. 

circuits Sl-1 to S4-2 is also a four-bit signal. Thus, the fig. 12 is a schematic block diagram showing a portion 

specific circuit configuration of control circuits Sl-1 to S4-2 0 f sta ge field 27 (FIG. 5) and a bit shifter controlling a 

differs from that of control circuits Sl-1 to S4-2 of the VLIW portion of stage field 27 of the VLIW processor according to 

processor according to the first embodiment. the third embodiment. Referring to FIG. 12, a portion 87 of 

FIG. 11 is a circuit diagram showing the detail of the a stage field is a portion of stage field 27 (FIG. 5) and 

control circuit (FIG. 7) of the VLIW processor according to 4Q corresponds to one entry. Thus, a bit shifter 89 provided 

the third embodiment of the present invention. Similar corresponding to a portion 87 of the stage field also corre- 

portions thereof to those shown in FIG. 8 are labeled by the spond to one entry. In a portion 87 of the stage field, the 

same reference characters and the description thereof is, stages are identified by bit vector representation using four 

where appropriate, not repeated. bits. Thus, bit shifter 89 sets the first bit [0] of a portion 87 

Referring to FIG. 11, control circuit Sl-1 includes AND 45 of the sta S e field t0 "1" when an instruction exists in the 

circuits 29-35 and 71-85 and an NOR circuit 53. AND instruction decoding stage ID, it sets the second bit [1] of a 

circuits 71 and 73 receive an output signal from AND circuit portion 87 of the stage field to "1" when an instruction exists 

29. AND circuits 75 and 77 receive an output signal from in the execution stage EX, it sets the third bit [2] of a portion 

AND circuit 31. AND circuits 79 and 81 receive an output 87 of the stage field to "1" when an instruction exists in the 

signal from AND circuit 33. AND circuits 83 and 85 receive 50 memory access stage MEM, and it sets the fourth bit [3] of 

an output signal from AND circuit 35. AND circuits 71, 75, the portion 87 of the stage field to "1" when an instruction 

79, 83 receive a signal STAGE [1]. AND circuits 73, 77, 81 exists in the write back stage WB. 

and 85 receive a signal STAGE [2]. In other words, whenever an instruction moves to another 

The signal STAGE [1] indicates the value of the second stage, i.e., per clock cycle, bit shifter 89 sets the bit of the 

bit of stage field 27 (FIG. 5), i.e., the second bit of a signal 55 bit vector corresponding to the stage. For example, when an 

STAGE. That is, the signal "STAGE [1] is the signal instruction exists in the execution stage EX, the bit vector of 

indicating whether an instruction grasped exists in the a portion 87 of the stage field is 0100. The signal VALID 

instruction execution stage EX. The signal STAGE [2] RESET for resetting validity field 23 (FIG. 5) may be 

indicates the value of the third bit of stage field 27, i.e., the generated when the instruction moves to the write back stage 

third bit of a signal STAGE. That is, the signal STAGE [2] 60 WB- Tta*s, the value of the fourth bit [3] of a portion 87 of 

is the signal indicating whether an instruction grasped exists stage field is adapted to serve as the signal VALID 

in the memory access stage MEM. More specifically, when RESET. 

the signal STAGE [1] is "1", it indicates that an instruction As described above, in the VLIW processor according to 

grasped exists in the instruction execution stage EX. When the third embodiment, the stages are represented by bit 

the signal STAGE [2] is "1", it indicates that an instruction 65 vector representation, that is, four stages are represented by 

grasped exists in the memory access stage MEM. Since bit vector representation using four bits. Thus, the control 

control circuit Sl-1 contemplates bypassing processing circuit (FIG. 11) and the circuit (bit shifter 89 shown in FIG. 
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12) controlling stage field 27 (FIG. 5) are simplified as circuit of the VLIW processor according to the fourth 

compared with the control circuit (FIG. 8) and the circuit embodiment, validness/invalidness of data in pipe field 25 

(stage field control circuit 54 shown in FIG. 9) controlling and stage field 27 is determined in the following manner, 

stage field 27 of the VLIW processor according to the first Instruction grasping circuit 88 is divided into a plurality 

embodiment. Consequently, in the VLIW processor accord- 5 of entries fl-fn, similar to the instruction grasping circuit of 

ing to the third embodiment, still faster generation of control FIG. 6. 

signals al-1-1 to r-4-2 and the signal VALID RESET and pipe fleld 2 5 is similar to that of the VLIW processor 
hence faster bypass control can be achieved. according to the second embodiment. That is, four tune- 
Furthermore, a characteristic portion of the VLIW pro- tional units 7-1 to 7-4 are represented by bit vector repre- 
cessor according to the second embodiment may be com- 10 sentation using four bits. It is assumed that when any one of 
bined with that of the VLIW processor according to the third values of the first to fourth bits of the bit vector in pipe field 
embodiment. That is, four functional units 7-1 to 7-4 are 25 for an entry is "1", data in pipe field 25 and stage field 
represented by bit vector representation using four bits, and 27 for the entry are valid. On the other hand, it is assumed 
the four stages (instruction decoding stage ID, execution that when values of the first to fourth bits of the bit vector 
stage EX, memory access stage MEM and write back stage 15 in pipe field 25 for an entry are all "0"s, data in pipe field 25 
WB) are represented by bit vector representation using four and stage field 27 for the entry are invalid, 
bits. In such an example, the specific circuit configuration of i n sucn ^ example, initially all bits of the bit vector of 
a control circuits Sl-1 to S4-2 (FIG. 7) is rendered different. p i pe fi e id 25 need be initialized to "0"s. Furthermore, when 
FIG. 13 is a circuit diagram showing the detail of control an instruction exists in the write back stage WB, all bits of 
circuit Sl-1 (FIG. 7) when the characteristic portion of the 20 the bit vector in pipe field 25 for an entry corresponding to 
VLIW processor according to the second embodiment is a destination address of the instruction need be reset. This is 
combined with that of the VLIW processor according to the because when an instruction (i.e., a processing result of an 
third embodiment. Referring to FIG. 13, control circuit Sl-1 instruction) exists in the write back stage WB, data may be 
includes AND circuits 63-69 and 71-85 and an NOR circuit read out directly from the register file and hence bypassing 
53. Similar portions thereof to those shown in FIGS. 10 and 25 is not required. Furthermore, the stage field control circuit 
11 are labeled by the same reference characters and the controlling stage field 27 is similar to that shown in FIG. 9. 
description thereof is, where appropriate, not repeated. The Therefore, all bits of the bit vector in pipe field 25 are reset 
specific circuit configuration of control circuits Sl-2 to S4-2 by the signal VALID RESET generated by stage field control 
is also similar to that of control circuit Sl-1 shown in FIG. circuit 54 shown in FIG. 9. 

13. When the characteristic portion of the VLIW processor 30 since the functional units are represented in pipe field 25 

according to the second embodiment is thus combined with by bit vector representation using four bits, four-bit signals 

that of the VLIW processor according to the third PIPEs are accordingly input from pipe field 25 to control 

embodiment, the control circuit is more simplified than the signal generating circuit 90. Since there are eight source 

control circuit (FIG. 10) of the VUW processor according to ^ addresses srcl-1 to src4-2, eight four-bit signals PIPEs are 

the second embodiment and the control circuit (FIG. 11) of mput from pipe field 25 to control signal generating circuit 

the VLIW processor according to the third embodiment. 90. 

Consequently, still faster bypass control can be achieved as FIG. 15 is a schematic block diagram showing a control 

compared with the VLIW processors according to the sec- signal gene rating circuit 90 shown in FIG. 14. Similar 

ond and third embodiments. 4o port i ons thereo f t0 those shown in FIG. 7 are labeled by the 

rprniTRTH FMTtnniMFNTl same reference characters and the description thereof is, 

^ where appropriate, not repeated. Referring to FIG. 15, 

The configuration of a VLIW processor as a parallel control circuit Sl-1 receives a four-bit signal PIPE and a 

processor according to a fourth embodiment is similar to that two-bit signal STAGE. Responsively, control circuit Sl-1 

of the VLIW processor according to the first embodiment 45 generates control signals el- 1-1 to e4-l-l, ml-1-1 to m4-l- 

shown in FIGS. 1 and 3. The form of a basic instruction 1, and r-1-1. Control circuits Sl-2 to S4-2 are similar to 

decoded at the instruction decoding stage ID of the VLIW control circuit Sl-1. 

processor according to the fourth embodiment is similar to jj\q 16 is a circuit diagram showing the detail of control 

that shown in FIG. 2. Correspondence of tristate buffers circuit Sl-1 shown in FIG. 15. Referring to FIG. 16, control 

T1-T72 of the VLIW processor according to the fourth 50 circuit Sl-1 includes AND circuits 91-105 and an NOR 

embodiment to control signals el-1-1 to r-4-2 is similar to circuits 53. AND circuits 91-105 receive signals STAGE [0] 

that shown in FIG. 4. and STAGE [1]. AND circuits 91 and 93 receive a signal 

FIG. 14 is a schematic block diagram showing bypass PIPE [0]. AND circuits 91 and 97 receive a signal PIPE [0]. 

control circuit 13 (FIG. 3) of the VLIW processor according AND circuits 95 and 97 receive a signal PIPE [1]. AND 

to the fourth embodiment. Similar portions thereof to those 55 circuits 91 and 101 receive a signal PIPE [2]. AND circuits 

shown in FIG. 5 are labeled by the same reference characters 103 and 105 receive a signal PIPE [3]. The signals PIPE [0], 

and the description thereof is, where appropriate, not PIPE [1], PIPE [2] and PIPE [3] are similar to those shown 

repeated. Referring to FIG. 14, the bypass control circuit in FIG. 10 (the second embodiment). The signals STAGE [0] 

includes field control circuits 21-1 to 21-4, an instruction and STAGE [1] are similar to those shown in FIG. 10. 

grasping circuit 88, an address decoder 17 and a grasping 60 The operation of control circuit Sl-1 shown in FIG. 16 

signal generating circuits 90. Instruction grasping circuit 88 will now be described with further reference to FIG. 3. 

includes a functional unit field (referred to as "a pipe field" When a source address of data to be required by latch circuit 

hereinafter) and a stage field 27. 1 is assumed to match with a destination address of a 

The bypass control circuit shown in FIG. 5 differs from processing result (i.e., data) of an instruction held in result 

that shown in FIG. 14 in that the bypass control circuit 65 buffer e3, the signals PIPE [0], PIPE [1] and PIPE [3] are 

shown in FIG. 14 does not have validity field 23 existing in "0"s and the signal PIPE [2] is "1", and the signals STAGE 

the bypass control circuit of FIG. 5. In the bypass control [0] and "STAGE [1] are "0" and "1", respectively. Thus, the 
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output signal of only AND circuit 99 is set to "1". That is, output signal of only AND circuit 115 is set to "1". That is, 

only control signal e3-l-l is set to "1". Thus, tristate buffer only control signal e3-l-l is set to "1". Accordingly, tristate 

T33 turns on and the processing result of the instruction held buffer T33 turns on and the processing results (i.e., the data) 

in result buffer e3 is transferred to latch circuit LI. of the instruction held in result buffer e3 is transferred to 

When a source address of data to be required by latch 5 * atcn circuit LI. 

circuit LI does not match with any of the destination On the other hand, when a source address of data to be 

addresses of processing results (i.e., data) of instructions required by latch circuit LI does not match with any of 

existing in result buffers el-e4 and ml-m4, that is, when destination addresses of processing results (i.e., data) of 

data in the pipe field and stage field for an entry correspond- eight instructions held in eight result buffers el-e4 and 

ing to the source address of data to be required by latch 10 ml-m4, that is, when data in pipe field 25 and stage field 27 

circuit ml are invalid, signals PIPE[0]-PIPE[3] are all set to for an entry corresponding to a source address of data to be 

"0"s. Therefore, output signals of AND circuits 91-105 are required by latch circuit LI are invalid, the signals PIPE 

all set to "0"s. Thus, control signal r-1-1 output from NOR [0]-PIPE [3] are all set to "(Ts. Accordingly, output signals 

circuit 53 is set to "1". Responsively, tristate buffer T65 of AND circuits 107-121 are all set to "(Ts. Accordingly, 

turns on and data is read out directly from register file 5 to 15 control signal r-1-1 output from NOR circuit 53 is set to "1". 

latch circuit LI. The circuit configuration of control circuits Thus, tristate buffer T65 turns on and data is directly read out 

Sl-2 to S4-2 is similar to that of control circuit Sl-1 shown from register file 5 to latch circuit LL The circuit configu- 

in FIG. 16. ration of control circuits Sl-2 to S4-2 is similar to that of 

As described above, in the VLIW processor according to contro1 circuil S1 ' X shown in FIG - 17 
the fourth embodiment, the four functional units are repre- 20 Thus, when the VLIW processor according to the fourth 
sented in pipe field 25 by bit vector representation using four embodiment is combined with the characteristic portion of 
bits, and pipe field 25 performs the function of validity field the VLIW processor according to the third embodiment, the 
23 shown in FIG. 5. Thus, instruction grasping circuit 88 in control circuits Sl-1 to S4-2 (FIG. 17) are more simplified 
the VLIW processor according to the fourth embodiment is than those of the VLIW processor according to the fourth 
more miniaturized than that according to the first embodi- 25 embodiment (FIG. 16). Consequently, when the VLIW 
ment (FIG. 5). processor according to the fourth embodiment is combined 

Furthermore, since pipe field 25 performs the function of with the characteristic portion of the VLIW processor 
validity field 23 shown in FIG. 5 in the VUW processor according to the third embodiment, faster bypass control can 
according to the fourth embodiment by representing the four 3Q achieved than in the VLIW processor according to the 
functional units using bit vector representation using four fourth embodiment, 
bits, there is no such signal VALID that is generated from [FIFTH EMBODIMENT] 

validity field 23 as shown in FIG. 5. Thus, control circuits 

Sl-1 to S4-2 are simplified as compared with control circuits The entire configuration of a VUW processor as a parallel 
Sl-1 to S4-2 (FIG. 8) of the VLIW processor according to _ . processor according to a fifth embodiment is similar to that 
the first embodiment. As a result, still faster bypass control 35 of the VLIW processor according to the first embodiment 
can be achieved in the VLIW processor according to the shown m ¥lG - 1 

fourth embodiment than in that of the first embodiment. FIG. 18 is a schematic block diagram showing a portion 

Furthermore, the VUW processor according to the fourth of tne VLIW processor according to the fifth embodiment, 
embodiment may be combined with a characteristic portion 40 Similar P orUon thereof to those shown in FIG - 3 are labeled 
of the VLIW processor according to the third embodiment. by the same reference characters and the description thereof 
That is, the four stages are represented in stage field 27 of where appropriate, not repeated, 

the VLIW processor according to the fourth embodiment by Referring to FIG. 18, address holding circuits eel^ee4 
bit vector representation using four bits. Thus, its control and mml-mm4 are provided corresponding to result buffers 
circuits Sl-1 to S4-2 are more simplified than those shown 45 el-e4 and ml-m4, respectively. Address holding circuits 
in FIG. 16. ee l> ee 2» ee3 and ee4 hold destination addresses el-a, €2-a, 

FIG. 17 is a circuit diagram showing the detail of control e3 "« aad of processing results of instructions held in 
circuit Sl-1 (FIG. 15) when the VLIW processor according result buffers el > e2 > 63 and e4 > respectively, 
to the fourth embodiment is combined with a characteristic Address holding circuits mml, mm2, mm3 and mm4 hold 
portion of the VLIW processor according to the third 50 destination addresses ml-a, m2-a, m3-a and m4-a of pro- 
embodiment. Similar portions thereof to those shown in cessing results of instructions held in result buffers ml, m2, 
FIG. 16 are labeled by the same reference characters and the m3 and m4, respectively. 

description thereof is, where appropriate, not repeated. Address holding circuits eel-ee4 and mml-mm4 are also 
Referring to FIG. 17, control circuit Sl-1 includes AND provided in the VLIW processor of FIG. 3, even though they 
circuits 107-121 and an NOR circuit 53. AND circuits 107, 55 are not shown. Therefore, address holding circuits eel^ee4 
111, 115 and 119 receive a signal STAGE [1]. AND circuits and mml-mm4 are not necessarily provided only for the 
109, 113, 117 and 121 receive a signal STAGE [2]. The VLIW processor according to the fourth embodiment. As 
signals STAGE [1] and STAGE [2] are similar to those described later, destination addresses el-a to e4-a and ml-a 
shown in FIG. 11, respectively. to m4-a held in address holding circuits eel-ee4 and 

The operation of control circuit Sl-1 shown in FIG. 17 60 mml-mm4 arc input to bypass control circuit 123. 
will be specifically described. A source address of data to be FIG. 19 is a schematic block diagram showing bypass 

required by latch circuit LI is assumed to match with a control circuit 123 of FIG. 18. Similar portions thereof to 
destination address of a processing result (i.e., data) of an those shown in FIG. 5 are labeled by the same reference 
instruction held in result buffer e3. In such an example, the characters and the description thereof is, where appropriate, 
signals PIPE [0], PIPE [1] and PIPE [3] are "0", and the 65 not repeated. 

signal PIPE [2] is "1". Furthermore, the signals STAGE [1] Referring to FIG, 19, the bypass control circuit includes 
and STAGE [2] are "1" and "0", respectively. Therefore, the field control circuits 21-1 to 21-4, an instruction grasping 
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circuit 125, an address decoder 17 and a control signal select circuit 131 with source address srcl-1. Then, when the 

generating circuit 127. Instruction grasping circuit 125 destination address selected at select circuit 131 matches 

includes a field indicating validness/invalidness (referred to with source address srcl-1, comparator 135 outputs an 

as "a validity field" hereinafter) 23 and a functional unit field output signal "1", and when they do not match with each 

(referred to as "a pipe field" hereinafter) 25. Similar to the 5 other, it outputs an output signal "0". 

instruction grasping circuit shown in FIG. 6, instruction AND circuits 137 and 139 are provided for checking 

grasping circuit 125 is divided into a plurality of entries whether data in pipe field 25 for an entry corresponding to 

fl-fn. The bypass control circuit shown in FIG. 19 differs source address srcl-1 is valid or invalid since AND circuits 

from that shown in FIG. 5 in that the bypass control circuit 137 anc j 139 receive a signal VALID. When the signal 

shown in FIG. 19 does not have such stage field 27 as shown 10 VALID is "1", that is, when data in the pipe field is valid, an 

in FIG. 5. Therefore, stage field control circuit 54 as shown output signal of AND circuit 137 is set to "1" when an output 

in FIG. 9 is not provided in instruction grasping circuit 125 signal of only comparator 133 is 'T\ and an output signal 

shown in FIG. 19, either. Control signal generating circuit 0 f only AND circuit 139 is set to "1" when an output signal 

127 receives destination addresses el-a to e4-a and ml-a to 0 f on iy comparator 135 is "1". When the signal VALID is 

m4-a held in address holding circuits eel-ee4 and is "l" and comparators 133 and 135 both output signals "T's, 

mml-mm4, and source addresses srcl-1 to src4-2 of data to an output signal of AND circuit 137 is set to "0*' and an 

be required by latch circuits L1-L8. output of only AND circuit 139 is set to "1", since the output 

FIG. 20 is a schematic block diagram showing control signal of comparator 135 is inverted and input into AND 
signal generating circuit 127 of FIG. 19. Similar portions circuit 137. That is, processing results (i.e., data) of instruc- 
thereof to those shown in FIG. 7 are labeled by the same 20 tions held in result buffers el-e4 at the instruction execution 
reference characters and the description thereof is, where stage EX are to be bypassed more preferentially than pro- 
appropriate, not repeated. cessing results of instructions held in result buffers ml-m4 

Referring to FIG. 20, control signal generating circuit 127 at the memory access stage MEM. 

includes control circuits Sl-1 to S4-2. Control circuits Sl-1 AND circuits 141-147 receive an output signal from 

to S4-2 receive destination addresses el-a to e4-a and ml-a 25 AND circuit 137. AND circuits 149-155 receive an output 

to m4-a held in address holding circuits eel-ee4 and signal from AND circuit 139. AND circuits 141-155 receive 

mml-mm4. Control circuit Sl-1 receives source address signals PIPE [0] and PIPE [1]. NOR circuit 53 receives 

srcl-1, and signals VALID and PIPE from an entry corre- output signals from AND circuits 141-155. AND circuits 

sponding to source address srcl-1. Similarly, control circuits 141-155 output control signals ml-1-1 to m4-l-l and el-1-1 

Sl-2 to S4-2 receive corresponding source addresses srcl-2 30 to e4-l-l, and NOR circuit 53 outputs control signal r-1-1. 

to src4-2 and signals VALID and PIPE from entries corre- operation of control circuit Sl-1 shown in FIG. 21 

sponding to source addresses srcl-2 to src4-2. ^1 be specifically described with further reference to FIG. 

FIG. 21 is a circuit diagram showing the detail of control 18. Source address srcl-1 of data to be required by latch 

circuit Sl-1 of FIG. 20. Similar portions thereof to those in 35 circuit LI is assumed to match with destination address e3-a 

FIG. 8 are labeled by the same reference characters and the of a processing result (i.e., data) of an instruction held in 

description thereof is, where appropriate, not repeated. The result buffer e3. The signal VALID is "1". Select circuits 129 

signals PIPE [0] and PIPE [1] shown in FIG. 21 is similar and 131 select destination addresses m3-a and e3-a, respec- 

to those shown in FIG. 8. tively. Comparator 133 compares destination address m3-a 

Referring to FIG. 21, control circuit Sl-1 includes a 40 with source address srcl-1 and outputs "0". Comparator 135 

decision circuit 128, AND circuits 137-155 and an NOR compares destination address e3-a with source address 

circuit 153. Decision circuit 128 includes select circuits 129 srcl-1 and outputs an output signal "1". Accordingly, output 

and 131 and comparators 133 and 135. Decision circuit 128 signals of AND circuits 137 and 139 are set to "0" and "1", 

is provided for determining whether source address srcl-1 of respectively. The signals PIPE [0] and PIPE [1] are "1" and 

data to be required by latch circuit LI matches with any of 45 "0", respectively. Accordingly, the output signal of only 

destination addresses ml-a to m4-a and el-a to e4-a of AND circuit 153 is set to "1". That is, only control signal 

processing results of instructions existing in eight result e3-l-l is set to "1". Thus, tristate buffer T33 rums on and the 

buffers el-e4 and ml-m4 of four functional units 7-1 to 7-4, processing result (i.e., the data) of the instruction held in 

which will now be described in detail. result buffer e3 is transferred to latch circuit LI. 

Select circuit 129 receives destination addresses ml-a to 50 When source address srcl-1 of data to be required by latch 

m4-l. Then, select circuit 129 selects destination address circuit LI does not match with any of destination addresses 

ml-a when a signal PIPE is "0", it selects destination of processing results of instructions existing in eight result 

address m2-a when a signal PIPE is "1", it selects destina- buffers el-e4 and ml-m4 in four functional units 7-1 to 7-4, 

lion address m3-a when a signal PIPE is "2", and it selects that is, when data in pipe field 25 for an entry corresponding 

destination address m4-a when a signal PIPE is "3". Select 55 to source address srcl-1 is invalid, output signals of AND 

circuit 131 selects destination address el-a when a signal circuit 137 and 139 are set to "0"s since the signal PIPE is 

PIPE is "0", it selects destination address e2-a when a signal "0", and output signals of AND circuits 141-155 are set to 

PIPE is "1", it selects destination address e3-a when a signal "0"s. Accordingly, a signal of NOR circuit 53 is set to "1". 

PIPE is "2", and it selects destination address e4-a when a That is, only control signal r-1-1 is set to "1". Thus, tristate 

signal PIPE is **3'\ 60 buffer T65 turns on and data is directly read out from register 

Comparator 133 compares a destination address selected file 5 to latch circuit LI. The circuit configuration of control 

at select circuit 129 with source address srcl-1. Then, when circuits Sl-2 to S4-2 is similar to that of control circuit Sl-1 

the destination address selected at select circuit 129 matches shown in FIG. 21. 

with source address srcl-1, comparator 133 outputs an In the VLIW processor according to the fifth embodiment, 

output signal "1" to AND circuit 137, and when they do not 65 as described above, a bypass control circuit controls bypass 

match with each other, it outputs an output signal "0". by grasping in which functional unit a processing result of 

Comparator 135 compares a destination address selected at an instruction exists. Thus, two comparators 133 and 135 are 
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sufficient for one source address. Furthermore, priority VLIW processor according to the second embodiment, con- 
selection between two data is sufficient for one source trol circuits Sl-1 to S4-2 thereof are more simplified than 
address. On the other hand, a conventional VLIW processor those of the VLIW processor according to the fourth 
requires eight comparators for one source address. Also, a embodiment. Thus, when the VLIW processor according to 
conventional VUW processor requires priority selection 5 the fifth embodiment is combined with the characteristic 
among eight data. Thus, circuitry and priority selection are portion of the VLIW processor according to the second 
more simplified in the VLIW processor according to the fifth embodiment, still faster bypass control can be achieved as 
embodiment than in a conventional VLIW processor. This compa red with the VLIW processor according to the fourth 
allows faster bypass control as compared with a conven- embodiment, 
tional VLIW processor. 

The VLIW processor according to the fifth embodiment [SIXTH EMBODIMENT] 

may be combined with a characteristic portion of the VLIW „ . - WT TVi/ _ „ 

J j . . . i j . r . . . The configuration or a VLIW processor as a parallel 

processor according to the second embodiment. That is, the & ^ . , *; . . r M 

four functional unite are represented in pipe field 25 shown 1™**?. ac ^Jf § to a Sath u embod " f S ™f T *° 

in FIG. 19 not by two bits but by bit vector of four bits. those of the VLIW processors shown in FIGS, l and 18. The 

Accordingly, an output signal from pipe field 25 is a four-bit 15 fo ™ of a basic instruction decoded at the decoding stage ID 

signal. Thus, the specific configuration of control circuits of the VLIW processor according to the sixth embodiment 

Sl-1 to S4-2 shown in FIG. 20 diffeis from that shown in » Slmllar t0 that shown in F } G - 2 - In the t™**** 

Pjq 21 according to the sixth embodiment, correspondence of 

FIG. 22 is a circuit diagram showing the detail of control ,„ ^"to ^ C V 1 ;1 !° l'^ 

circuit Sl-1 (FIG. 20) when the VLIW processor according 20 con ' r G lh " g tnState bufferS T1_T72 18 S,n " Iar t0 that sh ° Wn 

to the fifth embodiment is combined with the characteristic in 

portion of the VUW processor according to the second FIG * 23 is a schematic block diagram showing bypass 

embodiment. Similar portions thereof to those shown in control circuit 123 (FIG. 18) of the VLIW processor accord- 

FIG. 21 are labeled by the same reference characters and the in S to the sixth embodiment. Similar portions thereof to 

description thereof is, where appropriate, not repeated. 25 those shown ™ FIG - 5 are labeled b y the same reference 

Referring to FIG. 22, the control circuit includes a deci- characters and the description thereof is, where appropriate, 

sion circuit 128, AND circuits 137, 139 and 157-171, and an not re P eated - 

NOR circuit 53 Referring to FIG. 23, the bypass control circuit includes 

Referring to FIG. 22, AND circuits 157-163 and 165-171 30 Md control circuits 21-1 to 21-4 an instruction grasping 

receive output signals from AND circuits 137 and 139, circuit 173 > an addr f.f decoder 171 > and comro1 Sl f*} 

respectively. AND circuits 157 and 165, 159 and 167, 161 generating circuit 175. Instruction grasping circuit 173 

and 169, and 163 and 171 receive signals PIPE [0], PIPE [1], includes valldlt y field 23 ™ d sta § e field 21 ' 

PIPE [2] and PIPE [3], respectively. The signals PIPE An instruction grasping circuit 173, similar to that shown 

[0]-PIPE [3] are similar to those shown in FIG. 10 (the 35 in FIG. 6, is divided into a plurality of entries fl-fn. 

second embodiment). Output signals of AND circuits Instruction grasping circuit 173 shown in FIG. 23 differs 

157-171 are input to NOR circuit 153. AND circuits from that shown m FIG. 5 in that instruction grasping circuit 

157-171 output control signals ml-1-1 to m4-l-l and el -1-1 173 shown in FIG. 23 does not include pipe field 25 shown 

to e3-l-l, and NOR circuit 53 outputs control signal r-1-1. in FIG. 5. Thus, there is no signal PIPE SET input from field 

The operation of control circuit Sl-1 shown in FIG. 22 40 control circuits 21-1 to 21-4, either, 

will now be specifically described with further reference to Control signal generating circuit 175 receives source 

FIG. 18. Source address srcl-1 of data to be required by addresses srcl-1 to src4-2 of eight data to be input to latch 

latch circuit LI is assumed to match with destination address circuits L1-L8. Control signal generating circuit 175 also 

e3-a of a processing result (i.e., data) of an instruction held receives destination addresses cl-a to e4-a and ml-a to 

in result buffer e3. The signal VALID "1". A description of 45 m4-a held in address holding circuits eel-ee4 and 

processing by decision circuit 128 is not repeated here since mml-mm4, 

it is similar to the specific example described with reference FIG. 24 is a schematic block diagram showing control 

to FIG. 21. An output of AND circuit 137 is set to "0", and signal generating circuit 175 shown in FIG. 23. Similar 

an output signal of only AND circuit 139 is set to "1". portions thereof to those shown in FIG, 7 are labeled by the 

Signals PIPE [0], PIPE [1] and PIPE [3] are "0"s and only 50 same reference characters and the description thereof is, 

a signal PIPE [2] is "1", since result buffer e3 exists in where appropriate, not repeated. 

functional unit 7-3. Accordingly, an output signal of only Referring to FIG. 24, control signal generating circuit 175 

AND circuit 169 is set to "1". That is, only control signal includes control circuits Sl-1 to S4-2. Control circuits Sl-1 

e3-l-l is set to "1". Thus, tristate buffer T33 turns on and the to S4-2 receive destination addresses el-fl to e4-a and ml-<3 

operation result (i.e., the data) of the instruction held in 55 to m4-a. Control circuit Sl-1 receives a corresponding 

result buffer e3 is transferred to latch circuit LI. source address srcl-1, and signals VALID and PIPE from an 

When source address srcl-1 of data to be required by latch entry corresponding to source address srcl-1. Similarly, 

circuit LI does not match with any of destination addresses control circuits Sl-2 to S4-2 also receive corresponding 

of processing results of instructions held in eight result source addresses srcl-2 to src4-2, and signals VALID and 

buffers el-e4 and ml-m4 in four functional units 7-1 to 7-4 60 PIPE from entries corresponding to source addresses srcl-2 

(i.e., when a signal VALID is "0"), the operation of control to src4-2. 

circuit Sl-1 shown in FIG. 22 is similar to that of control FIG. 25 is a circuit diagram showing the detail of control 

circuit Sl-1 shown in FIG. 21. The circuit configuration of circuit Sl-1 shown in FIG. 24. Similar portions thereof to 

the other control circuits Sl-2 to S4-2 is similar to that of those shown in FIG. 8 are labeled by the same reference 

control circuit Sl-1 shown in FIG. 22. 65 characters and the description thereof is, where appropriate, 

When the VLIW processor according to the fifth embodi- not repeated. Referring to FIG. 25, control circuit Sl-1 

ment is thus combined with a characteristic portion of the includes a decision circuit 177, AND circuits 195 tn 209 and 
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an NOR circuit 153. Decision circuit 177 includes select 
circuits 179-185 and comparators 187-193. Decision circuit 
177 is provided for determining whether source address 
srcl-1 of data to be required by latch circuit LI matches with 
a destination address of a processing result (i.e., data) of an 5 
instruction existing in any of result buffers el-e4 and 
ml-m4 in four functional units 7-1 to 7-4, which will be 
described below in detail. 

Comparators 179, 181, 183 and 185 receive destination 
addresses el-a and ml -a, ol-a and ml- a, e3-a and m3-a, 30 
and c4-a and m4-a, respectively. Comparators 179-185 also 
receive a signal STAGE. When the signal STAGE is "1", 
that is, when it indicates the execution stage EX, compara- 
tors 179, 181, and 183 and 185 select destination addresses 
el-a, tl-a, &3~a and e4-a, respectively. When the signal 15 
STAGE is "2", that is, when it indicates the memory access 
stage MEM, comparator 179, 181, 183 and 185 select 
destination addresses ml-a and m2-a, m3-a and m4-a, 
respectively. When the signal STAGE is neither "1" nor "2", 
comparators 179-185 output "0"s. 20 

Comparators 187, 189, 191 and 193 compare destination 
addresses selected by select circuits 179, 181, 183 and 185 
with source address srcl-1, respectively, and output "l"s 
when they match with each other and output "0"s when they 
do not match with each other. 25 

Thus, decision circuit 177 determines whether source 
address srcl-1 of data to be required by latch circuit LI 
matches with a destination address of a processing result 
(i.e., data) of an instruction existing in any of eight result 3Q 
buffers el-e4 and ml-m4 in four functional units 7-1 to 7-4, 
to identify a functional unit in which a processing result of 
an instruction having a destination address matching with 
source address srcl-1 of data to be required by latch circuit 
LI exists. 35 

That is, when an output signal of comparator 187 is "1", 
it indicates that a processing result of an instruction having 
a destination address matching with source address srcl-1 
exists in functional unit 7-1. When an output signal of 
comparator 189 is "1", it indicates that a processing result of 40 
an instruction having a destination address matching with 
source address srcl-1 exists in functional unit 7-2. When an 
output signal of comparator 191 is "1", it indicates that a 
processing result of an instruction having a destination 
address matching with source address srcl-1 exists in func- 45 
tional unit 7-3. When an output signal of comparator 193 is 
"1", it indicates that a processing result of an instruction 
having a destination address matching with source address 
srcl-1 exists in functional unit 7-4. 

When a signal STAGE [0] is "1", the signal STAGE is "2" 50 
or "3". When a signal STAGE is "2", it indicates the 
execution stage EX. Accordingly, the signal STAGE [0] is 
input to AND circuits 197, 201, 205 and 209 generating 
control signals ml-1-1, m2-l-l, m3-l-l and m4-l-l for 
controlling tristate buffers Tl, T17, T33 and T49 corre- 55 
sponding to result buffers el-e4 of the execution stage EX. 
When a signal STAGE [1] is "1", the signal STAGE is <T'0 
or "3". When a signal STAGE is "1", it indicates the memory 
access stage MEM. Accordingly, the signal STAGE [1] is 
input to AND circuits 195, 199, 203 and 207 generating 60 
control signals el-1-1, e2-l-l, e3-l-l and e4-l-l for con- 
trolling tristate buffers T9, T25, T41 and T57 corresponding 
to result buffers ml-m4 of the memory access stage MEM. 

As described above, signals STAGE [0] and STAGE [1] 
need not be input to all of AND circuits 195-209. It is not 65 
an issue that when a signal STAGE [0] is "1" or when a 
signal STAGE [1] is "1", the signal STAGE may be "3", 
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since AND circuits 195-209 all receive a signal VALID. 
That is, when a signal STAGE is it indicates that a process- 
ing result of an instruction exists in the write back stage EB, 
and in this case, a signal VALID is set to "0" since data may 
be read out directly from register file 5. 

In control circuit Sl-1, processing results of instructions 
held in functional unit 7-2 result buffers e2 and m2, func- 
tional unit 7-3 result buffers e3 and m3, and in a functional 
unit 7-4 result buffers e4 and m4 are adapted to be bypassed 
more preferentially than processing results of instructions 
held in functional unit 7-1 result buffers el and ml, func- 
tional unit 7-2 result buffers e2 and m2, and in functional 
unit 7-3 result buffers e3 and m3, respectively. Thus, AND 
circuits 195 and 197 receive output signals of comparators 
187, 189, 191 and 193, AND circuits 199 and 201 receive 
output signals of comparators 189, 191 and 193, AND 
circuits 203 and 205 receive output signals of comparators 
191 and 193, and AND circuits 207 and 209 receive an 
output signal of comparator 193. 

The operation of control circuit Sl-1 will be specifically 
described with further reference to FIG. 18. When source 
address srcl-1 of data required by latch circuit LI is 
assumed to match with only destination address el -a held in 
address holding circuit el, the signal STAGE is "1". 
Accordingly, select circuits 179, 181, 183 and 185 output 
destination addresses el -a, e2-a, e3-a and m4-a, respec- 
tively. Then, an output of only comparator 187 is set to "1" 
and outputs of comparators 189-193 are set to "0"s. The 
signals STAGE [0] and STAGE [1] are "0" and "1", respec- 
tively. Furthermore, a signal VALID is assumed to be "1". In 
such a case, an output signal of only AND circuit 195 is set 
to "1". That is, only control signal el-1-1 is set to "1". Thus, 
tristate buffer Tl turns on and the processing result of the 
instruction held in result buffer el is transferred to latch 
circuit LI. 

When source address srcl-1 of data to be required to be 
latch circuit LI does not match with any of destination 
addresses el-a to e4-a and ml-a to m4-a held in address 
holding circuits eel-ee4 and mml-mm4, that is, when a 
signal VALID is "0", output signals of AND circuits 
195-209 are all set to "0"s. Accordingly, only control signal 
r-1-1 output from NOR circuit 53 is set to "1". Thus, tristate 
buffer T65 turns on and data is read out directly from register 
file 5 to latch circuit LI. The circuit configuration of control 
circuits SI -2 to S4-2 is similar to that of control circuit Sl-1 
shown in FIG. 25. 

As described above, in the VLIW processor according to 
the sixth embodiment, entries for instruction grasping circuit 
173 are used to grasp in which stage an instruction having 
a destination address corresponding to an entry exists. Thus, 
four comparators 187-193 are sufficient for one source 
address. On the other hand, a conventional VLIW processor 
requires eight comparators for one source address. 
Furthermore, in the VLIW processor according to the sixth 
embodiment, since four comparators 187-193 are sufficient 
for one source address, priority selection among four output 
signals from comparators 187-193 suffices. On the other 
hand, a conventional VLIW processor requires priority 
selection among eight data for one source address. Thus, in 
the VLIW processor according to the sixth embodiment, its 
circuitry is simplified and the number of objects to which 
priority selection is applied is reduced so that fast bypass 
control can be achieved. 

The VLIW processor according to the sixth embodiment 
can be combined with a characteristic portion of the VLIW 
processor according to the third embodiment. That is, the 
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four stages are represented in stage field 27 by bit vector 
representation using four bits. This allows provision of bit 
shifter 89 (FIG. 12) rather than stage field control circuit 54 
(FIG. 9), to control stage field 27. Thus, still more simplified 
circuitry and still faster bypass control can be achieved than 5 
in the VLIW processor according to the sixth embodiment. 

Although the present invention has been described and 
illustrated in detail, it is clearly understood that the same is 
by way of illustration and example only and is not to be 
taken by way of limitation, the spirit and scope of the present to 
invention being limited only by the terms of the appended 
claims. 

What is claimed is: 

1. A parallel processor having a register file for storing 
therein a processing result of an instruction according to a 15 
destination address of the instruction, and processing in 
parallel a plurality of said instructions included in one basic 
instruction, said parallel processor comprising: 

a plurality of functional units each processing a corre- 
sponding one of said instructions, each of said func- 20 
tional units having a plurality of processing stages for 
pipelining said corresponding one of successively input 
said instructions; 

bypass means for selectively supplying a plurality of said 
processing results existing in a plurality of said pro- 
cessing stages in said plurality of functional units to a 
plurality of initial ones of said processing stages in said 
plurality of functional units; and 

bypass control means using a plurality of entries corre- 3Q 
sponding to a plurality of addresses of said register file 
for controlling, by grasping in which one of said 
functional units and in which one of said processing 
stages said instruction having said destination address 
corresponding to said entry exists, said bypass means 35 
such that when said destination address of said instruc- 
tion existing in any of said plurality of processing 
stages of said plurality of functional units matches with 
a source address of said instruction to be processed at 
said initial processing stage of said functional unit, said 4Q 
processing result of said instruction having said match- 
ing destination address is supplied from said processing 
stage in which said instruction having said matching 
destination address exists to said initial processing 
stage at which said instruction having said matching 45 
source address is to be processed, 

when said bypass control means grasps said instruction 
having a certain destination address and when a new 
said instruction having a same one as said certain 
destination address is input to any of said plurality of 50 
functional units, said bypass control means grasping 
the newly input said instruction by said entry corre- 
sponding, to said certain destination address. 

2. The parallel processor according to claim 1, wherein: 
said bypass control means includes instruction grasping 55 

means formed of a functional unit field having data 
indicating in which one of said functional units said 
instruction grasped exists, of a processing stage field 
having data indicating in which one of said processing 
stages said instruction grasped exists, and of a field 60 
indicating validness/invalidness having data indicating 
whether data in said functional unit field and said 
processing stage field are valid or invalid, said instruc- 
tion grasping means being divided into said plurality of 
entries; 65 
when a new said instruction is input, said bypass control 
means sets said field indicating validness/invalidness 



for said entry corresponding to a destination address of 
said new instruction, and sets said functional unit for 
said entry corresponding to the destination address of 
said new instruction to indicate said functional unit to 
which said new instruction is input; 

when said instruction newly input exists in a processing 
stage previous to said initial processing stage, said 
bypass control means resets said processing stage field 
for said entry corresponding to a destination address of 
said new instruction; and 

said bypass control means newly sets said processing 
stage field for said entry whenever said instruction 
having said destination address corresponding to said 
entry moves to any of said processing stages. 

3. The parallel processor according to claim 1, wherein 
said bypass control means includes instruction grasping 

means formed of a functional unit field having data 
indicating in which one of said functional units said 
instruction grasped exists and a processing stage field 
having data indicating in which one of said processing 
stages said instruction grasped exists, said instruction 
grasping means being divided into said plurality of 
entries, a bit vector the number of bits of which is equal 
to that of said plurality of functional units being used in 
said functional unit field in order to indicate in which 
one of said functional units said instruction exists; 

when a new said instruction is input to said functional 
unit, said bypass control means sets such bit of said bit 
vector that corresponds to said functional unit to which 
said new instruction is input in said functional unit field 
for said entry corresponding to said destination address 
of said new instruction, and when a newly input said 
instruction exists at a stage previous to said initial 
processing stage, said bypass control means resets said 
processing stage field for said entry corresponding to a 
destination address of said newly input instruction; and 

said bypass control means newly sets said processing 
stage field for said entry whenever said instruction 
having said destination address corresponding to said 
entry moves to any of said processing stages, and data 
in said functional unit field and said processing stage 
field for said entry is valid when any one of bits of said 
bit vector in said functional unit field for said entry is 
set, and data in said functional unit field and said 
processing stage field for said entry is invalid when any 
of bits of said bits vector is not set. 

4. A parallel processor having a register file for storing 
therein a processing result of an instruction according to a 
destination address of the instruction, and processing in 
parallel a plurality of said instructions included in one basic 
instruction, said parallel processor comprising: 

a plurality of functional units each processing a corre- 
sponding one of said instructions, each of said func- 
tional units having a plurality of processing stages for 
pipelining said corresponding one of successively input 
said instructions; 

bypass means for selectively supplying a plurality of said 
processing results existing in a plurality of said pro- 
cessing stages in said plurality of functional units to a 
plurality of initial ones of said processing stages in said 
plurality of functional units; and 

bypass control means using a plurality of entries corre- 
sponding to a plurality of addresses of said register file 
for controlling, by grasping in which one of said 
functional units said instruction having said destination 
address corresponding to said entry exists, said bypass 
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means such that when said destination address of said 
instruction existing in any of said plurality of process- 
ing stages of said plurality of functional units matches 
with a source address of said instruction to be pro- 
cessed at said initial processing stage of said functional 5 
unit, said processing result of said instruction having 
said matching destination address is supplied from said 
processing stage in which said instruction having said 
matching destination address exists to said initial pro- 
cessing stage at which said instruction having said 
matching source address is to be processed, 

when said bypass control means grasps said instruction 
having a certain destination address and when a new 
said instruction having a same one as said certain 
destination address is input to any of said plurality of 
functional units, said bypass control means grasping 15 
the newly input said instruction by said entry corre- 
sponding to said certain destination address. 

5. The parallel processor according to claim 4, wherein 
said bypass control means includes: 

instruction grasping means formed of a functional unit 20 
field having data indicating in which one of said 
functional units said instruction grasped exists and of a 
field indicating validness/invalidness having data indi- 
cating whether data in said functional unit field is valid 
or invalid, 25 

said instruction grasping means being divided into said 
plurality of entries, 

when a new said instruction is input into said functional 
unit, said bypass control means setting said field indi- 3Q 
eating validness/invalidness for said entry correspond- 
ing to a destination of said new instruction, and setting 
said functional unit field for said entry corresponding to 
a destination address of said new instruction to indicate 
said functional unit to which said new instruction is 35 
input; and 

a plurality of decision means provided corresponding to a 
plurality of said source addresses of said plurality of 
instructions included in one said basic instruction, each 
for determining whether a source address correspond- 4Q 
ing to said instruction newly input matches with a 
destination address of said instruction existing in any of 
said plurality of processing stages of said plurality of 
functional units, 

said decision means including a plurality of select means 45 
provided corresponding to said plurality of processing 
stages in each of said functional units, 

said select means receiving a plurality of said destination 
addresses of a plurality of said instructions existing in 
such plurality of said processing stages in said plurality 50 
of functional units that correspond to said select means, 
and data in said functional unit field for said entry 
corresponding to said source address of said instruction 
newly input, 

said select means outputting a destination address of an 55 
instruction existing in said processing stage corre- 
sponding to said functional unit indicated by the input 
data of said functional unit field, 

said decision means further including a plurality of com- 
pare means provided corresponding to said plurality of 60 
select means, said compare means comparing said 
destination address output from corresponding said 
select means with said source address of said instruc- 
tion newly input and determining whether said source 
address of said newly input instruction matches with 65 
said destination address output from the corresponding 
said select means. 
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6. A parallel processor having a register file for storing 
therein a processing result of an instruction according to a 
destination address of the instruction, and processing in 
parallel a plurality of said instructions included in one basic 
instruction, said parallel processor comprising: 

a plurality of functional units each processing a corre- 
sponding one of said instructions, each of said func- 
tional units having a plurality of processing stages for 
pipelining said corresponding one of successively input 
said instructions; 

bypass means for selectively supplying a plurality of said 
processing results existing in a plurality of said pro- 
cessing stages in said plurality of functional units to a 
plurality of initial ones of said processing stages in said 
plurality of functional units; and 

bypass control means using a plurality of entries corre- 
sponding to a plurality of addresses of said register file 
for controlling, by grasping in which one of said 
processing stages said instruction having said destina- 
tion address corresponding to said entry exists, said 
bypass means such that when said destination address 
of said instruction existing in any of said plurality of 
processing stages of said plurality of functional units 
matches with a source address of said instruction to be 
processed at said initial processing stage of said func- 
tional unit, said processing result of said instruction 
having said matching destination address is supplied 
from said processing stage in which said instruction 
having said matching destination address exists to said 
initial processing stage at which said instruction having 
said matching source address is to be processed, 

when said bypass control means grasps said instruction 
having a certain destination address and when a new 
said instruction having a same one as said certain 
destination address is input to any of said plurality of 
functional units, said bypass control means grasping 
the newly input said instruction by said entry corre- 
sponding to said certain destination address. 

7. The parallel processor according to claim 6, wherein 
said bypass control means includes: 

instruction grasping means formed of a processing stage 
field having data indicating in which one of said 
processing stages said instruction grasped exists and of 
a field indicating validness/invalidness having data 
indicating whether data in said processing stage field is 
valid or invalid, 

said instruction grasping means being divided into said 
pluralty of entries, 

said bypass control means setting said field indicating 
validness/invalidness for said entry corresponding to a 
destination addresss of a new said instruction when said 
new instruciton is input, 

said bypass control means resetting said processing stage 
field for said entry corresponding to a destination 
address of said instruciton newly input when said 
newly input instrctuion exists at a stage previous to said 
initial processing stage, 

said bypass control means newly setting said processing 
stage field for said entry whenever said instruction 
having said destination address corresponding to said 
entry moves to any of said processing stages; and 

a plurality of decision means provided corresponding to a 
plurality of said source addresses of a plurality of said 
instructions included in one said basic instruction, each 
for determining whether a source address correspond- 
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ing to said instruction newly input matches with a 
destination address of said instruction existing in any of 
said plurality of processing stages of said plurality of 
functional units, 
said decision means including a plruality of select means 5 
provided corresponding to said plurality of functional 
units, 

said select means receiving destination addrsses of a 
plurality of said instructions existing in said plurality of 
processing stages of a corresponding one of said func- 10 
tional units, and data in said processing stage field for 
an entry corresponding to a source address of said 
instruction newly input, 

said select means outputting said destination address of 
said instruciton existing in said processing stage cor- 15 
responding to said processing stage indicated by the 
input data of said processing stage field, 

said decision means further including a plurality of com- 
pare means provided corresponding to said plurality of 
select means, said compare means comparing said 20 
destination address output from corresponding said 
select means with said source address of said instruc- 
tion newly input and determining whether the source 
address of said instruction newly input matches with 
said destination address output from the corresponding 25 
said select means. 

8. The parallel processor according to claim 2, wherein a 
bit vector the number of bits of which is equal to that of said 
plurality of functional units is used in said functional unit 
field to indicate in which one of said functional units said 30 
instruction exists. 

9. The parallel processor according to claim 5, wherein a 
bit vector the number of bits of which is equal to that of said 
plurality of functional units is used in said functional unit 
field to indicate in which one of said functional units said 35 
instruction exists. 

10. The parallel processor according to claim 2, wherein 
a bit vector the number of bits of which is equal to that of 

said plurality of processing stages in each of said 
functional units is used in said processing stage field to 40 
indicate in which one of said processing stages said 
instruction exists, and wherein 
said instruction grasping means sets such bit of said bit 
vector that corresponds to said processing stage in ^ 
which said instruction exists. 

11. The parallel processor according to claim 3, wherein 
a bit vector the number of bits of which is equal to that of 

said plurality of processing stages in each of said 
functional units is used in said processing stage field to 5Q 
indicate in which one of said processing stages said 
instruction exists, and wherein 
said instruction grasping means sets such bit of said bit 
vector that corresponds to said processing stage in 
which said instruction exists. 55 

12. The parallel processor according to claim 7, wherein 
a bit vector the number of bits of which is equal to that of 

said plurality of processing stages in each of said 
functional units is used in said processing stage field to 
indicate in which one of said processing stages said 60 
instruction exists, and wherein 
said instruction grasping means sets such bit of said bit 
vector that corresponds to said processing stage in 
which said instruction exists. 

13. The parallel procesosr according to claim 2, wherein 65 
a final one of said processing stages in each of said 

functional units writes said processing result existing 



therein into said register file according to said destina- 
tion address and said bypass means is not applied, and 
wherein 

said instruction grasping means includes a plurality of 
processing stage field control means provided corre- 
sponding to said plurality of entries each for controlling 
said processing stage field for a corresponding one of 
said entries, said processing stage field control means 
including: 

updating means for setting said data in said processing 
stage field for the corresponding one of said entries 
whenever said instruction moves to any of said pro- 
cessing stages, such that said data corresponds to said 
any of said processing stages to which said instruction 
moves; 

reference means having reference data corresponding to 
the final one of said processing stages; and 

data comparing means for comparing the data in said 
processing stage field with said reference data in said 
reference means and resetting said field indicating 
validness/invalidness for the corresponding one of said 
entries when the both data match with each other. 

14. The parallel procesosr according to claim 7, wherein 
a final one of said processing stages in each of said 

functional units writes said processing result existing 
therein into said register file according to said destina- 
tion address and said bypass means is not applied, and 
wherein 

said instruction grasping means includes a plurality of 
processing stage field control means provided corre- 
sponding to said plurality of entries each for controlling 
said processing stage field for a corresponding one of 
said entries, said processing stage field control means 
including: 

updating means for setting said data in said processing 
stage field for the corresponding one of said entries 
whenever said instruction moves to any of said pro- 
cessing stages, such that said data corresponds to said 
any of said processing stages to which said instruction 
moves; 

reference means having reference data corresponding to 
the final one of said processing stages; and 

data comparing means for comparing the data in said 
processing stage field with said reference data in said 
reference means and resetting said field indicating 
validness/invalidness for the corresponding one of said 
entries when the both data match with each other. 

15. The parallel processor according to claim 10, wherein: 
a final one of said processing stages in each of said 

functional units writes said processes result existing 
therein according to said destination address and said 
bypass means is not applied; 

the number of bits of said bit vector is larger by one than 
that of said plurality of processing stages in each one of 
said functional units, one bit of said bit vector indicat- 
ing said stage at which said instruction exists just 
before said instruction moves to the initial one of said 
processing stages; and 

said instruction grasping means includes a plurality of 
processing stage field control means provided corre- 
sponding to said plurality of entries each for controlling 
said processing stage field for a corresponding one of 
said entries, said processing stage field control means 
being a bit shifter and setting such bit of said bit vector 
that corresponds to said stage or any of said processing 
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stages to which said instruction moves whenever said 
instruction moves to said stage or any one of said 
processing stages. 

16. The parallel processor according to claim 11, wherein: 

a final one of said processing stages in each of said 
functional units writes said processing result existing 
therein according to said destination address and said 
bypass means is not applied; 

the number of bits of said bit vector is larger by one than 
that of said plurality of processing stages in each one of 
said functional units, one bit of said bit vector indicat- 
ing said stage at which said instruction exists just 
before said instruction moves to the initial one of said 
processing stages; and 

said instruction grasping means includes a plurality of 
processing stage field control means provided corre- 
sponding to said plurality of entries each for controlling 
said processing stage field for a corresponding one of 
said entries, said processing stage field control means 
being a bit shifter and setting such bit of said bit vector 
that corresponds to said stage or any of said processing 
stages to which said instruction moves whenever said 
instruction moves to said stage or any one of said 
processing stages. 
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17. The parallel processor according to claim 12, wherein: 

a final one of said processing stages in each of said 
functional units writes said processing result existing 
therein according to said destination address and said 
bypass means is not applied; 

the number of bits of said bit vector is larger by one than 
that of said plurality of processing stages in each one of 
said functional units, one bit of said bit vector indicat- 
ing said stage at which said instruction exists just 
before said instruction moves to the initial one of said 
processing stages; and 

said instruction grasping means includes a plurality of 
processing stage field control means provided corre- 
sponding to said plurality of entries each for controlling 
said processing stage field for a corresponding one of 
said entries, said processing stage field control means 
being a bit shifter and setting such bit of said bit vector 
that corresponds to said stage or any of said processing 
stages to which said instruction moves whenever said 
instruction moves to said stage or any one of said 
processing stages. 
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[57] ABSTRACT 

An input/output (I/O) system and method for coupling 
a host computer to a plurality of peripheral devices in 
which data destined for peripheral devices is transferred 
to an output data buffer whose locations are paired with 
output channel addresses stored in an output device 
table. A microcomputer performs any processing re- 
quired on data stored in the output data buffer by read- 
ing the address and a function code in the output device 
table then distributes processed data to an output device 
block whose locations are addresses of output channels. 
An input data buffer and input device table similarly 
arranged, collects and processes input data continu- 
ously, which input data buffer can be transferred to the 
host computer, on command, in a high speed burst. 

3 Claims, 2 Drawing Figures 
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INPUT/OUTPUT SYSTEM AND METHOD FOR 
DIGITAL COMPUTERS 

BACKGROUND OF THE INVENTION 5 

1. Field of the Invention 

This invention pertains to the transfer of high speed 
bursts of data from a host computer to a plurality of 
peripheral devices. 

2. Description of the Prior Art 

Input/output peripheral devices such as digital to 
analog converters have traditionally been connected to 
digital computer through special purpose controllers. 
One or more computer memory addresses are generally 
assigned to each peripheral device. Where large num- 15 
bers of input/output channels are required such as in 
flight simulators or manufacturing processes each pe- 
ripheral has had its own hardware dedicated address. 
Such a system is inefficient because a large share of the 
computer's resources must be devoted to data acquis- 20 
tion and control of peripheral devices. To relieve the 
computer, commonly called the host computer, from 
performing input/output operations various micro- 
processor based devices have been developed to handle 
input/output operations. Such devices commonly 25 
called front-end processors and intellegent peripheral 
controllers have increased real time system perfor- 
mance by providing greater throughput (the amount of 
data that the system can handle in a given time) and 
faster response time (the time needed to perceive and 30 
react to an event). The present invention in addition to 
providing increased system performance is capable of 
accepting high speed bursts of data from a host com- 
puter, further processing the data to be transferred, then 
distributing the processed data to a large number of 35 
peripheral devices. Further, the present invention is 
capable of assigning data to selected periperhal devices 
under instructions from the host computer which may 
be varied dynamically during system operation or may 
be input by a system operator from a control terminal. 40 

SUMMARY OF THE INVENTION 

The present invention is an input/output (I/O) sys- 
tem and method for coupling a host computer to a plu- 
rality of peripheral devices. Data to be transferred from 45 
the host computer to peripheral devices is written by 
the direct memory access technique to an output data 
buffer which is a dedicated block of memory in the 
programmable I/O system. From the output data buffer 
data is transferred, under program control, of a mi- 50 
crocomputer in the programmable I/O system to the 
specific addresses of each peripheral device. 

Associated with the output data buffer is an output 
device table. Each word in the output device table is 
paired with a word in the output data buffer. The output 55 
device table is a list of addresses of the peripheral de- 
vices to which the data in the corresponding address of 
the output data buffer is to be sent. The output device 
table may be contained in read-only-memory or may be 
loaded by the host computer upon system initialization. 60 
The output device table may also be loaded "off-line" 
via a terminal device and may be dynamically varied by 
the host computer during system operation. A portion 
of each word in the output device table is required for 
peripheral device address. Another portion of each 65 
word in the output device table is used as a function 
code to specify the type of processing to be performed 
on the data in each word in the output data buffer be- 
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fore the data is transferred to the specified output ad- 
dress. Each function specified by the function code is 
performed by a separate sub-routine which is identified 
by the function code. The sub-routines for processing 
data from the host computer are stored in read only 
memory and are executed upon completion of an output 
data transfer from the host computer to the output data 
buffer. Transfer of input data from peripheral devices to 
the host computer proceeds as follows: The microcom- 
puter 13 continuously reads input data from peripheral 
devices as specified by the input device table in memory 
12. This data is then loaded into the input data buffer in 
memory 12 for access by the host computer. This opera- 
tion is continuous unless the microcomputer is inter- 
rupted by a higher priority task such as output data 
processing. 

In addition to output and input data processing and 
transfer, the input/output system is also capable of per- 
forming input/output testing, and system monitoring. 
Input/output testing may also be initiated by an inter- 
rupt from the host computer or it may be initiated by a 
Systems Operator from a terminal. When initiated, the 
input/output test program first identifies each periph- 
eral device attached to the system as to type by reading 
the identification bits in the command and status regis- 
ter location associated with each peripheral device, it 
then performs loop-back testing of each bit of each 
input/output address. When an error is encountered, 
one particular bit is set in a block of memory called the 
error buffer. Each bit in this buffer is associated with 
one of the input/output device addresses. Upon com- 
pletion of the test function, the host computer is sig- 
nalled and can then initiate an error buffer read to local- 
ize the problem to a single input/output channel. 

The system monitor allows operation and testing of 
the programmable I/O system while it is disconnected 
from the host computer. Using a terminal device and 
the system monitor an operator can display and modify 
the contents of memory, perform input/output testing, 
execute input and output processing and enter and run 
user programs. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of the I/O systems. 

FIG. 2 is a block diagram of the output device table, 
output data buffer and output device block contained in 
the memory. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

The input/output (I/O) system 10 is capable of per- 
forming the following functions: output data processing 
and distribution, input data collection and processing, 
Input/Output testing, and system monitoring. 

A microcomputer 13 of a general design well known 
in the art which may include an Intel 8086 microproces- 
sor is connected in a conventional manner to perform 
the functions described herein. The microcomputer is 
coupled to receive data, address, control and interrupt 
signals from a host computer. The microcomputer is 
further coupled to provide data and control signals to 
the host computer and data, control and address signals 
to a memory and to a plurality of I/O modules. The 
memory is of a type also well known in the art and 
contains both read only memory and random access 
memory. The read-only-memory contains a plurality of 
programs which support the operation of the I/O sys- 
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tern 10 including input and output processing programs In response to the control signals the I/O system 

and their function sub-routines, an I/O test program, interface unit initiates operations of the I/O system 

and a system monitor program. A block of the random which include: 

access memory in the I/O system 10 is defined as the Input Data Buffer-transfers the contents of the input 

output data buffer. It is into this block of memory in 5 data buffer in the memory 12 of the I/O system 10 to a 

memory 12 that the host computer (not shown) trans- block of memory in the host computer; Output Data 

fers, in high speed bursts,out P ut data destined for pe- Buffer-transfers data from a block of memory m the 

' . " B y A . ' . < fU rtlltfttlt Ha t a hnf£r Host computer to the output data buffer in the memory 

npheral devices. Associated with the output daU buffer ^ ^ tnmsfer from 

u an output device table which 10 the host computer to output data buffer in the I/O sys- 
consecutive memory locations m memory 12. Each ^ 1Q ^ ^ ^ I/Q % ^ n 
word m the output device table is paired with a word m issue$ ^ {n t s{ { tQ the microcomputer 13 ^ 
the output data buffer. The output device table is a list ing Qutput processing of data in the output data buf r er 
of memory mapped addresses of I/O channels to which and transfer of the processed data to selected peripheral 
the data in the output data buffer is to be sent. The outut 15 devices; i npu t/Output Device Table Read/Write— ini- 
device table may be contained in read-only-memory or tially used duruig sys tem set-up to transfer input and 
may be loaded by the host computer upon systems ini- output device tables from the host computer to the 
tialization and may be varied during system operation to input ^ output device table locations in the memory 
obtain a desired output. The output device table may 12 0 f tne I/O system 10. During system operation the 
also be loaded when the system is "off-line" through a 2 n host computer may vary the input/output device tables 
terminal (not shown). The addresses of all output de- t0 provide a desired output to selected peripheral de- 
vices coupled to the I/O system 10 are preferably con- vices; Constants Table Read/Write— used to calibrate 
fined to the top 1024 addresses in memory, hence only or change calibration of an I/O channel on which the 
the 10 least significant bits of the device address appear calibration function yi—apci-\-bi is performed; Error 
in the output device table, since the six most significant 25 Buffer Read— transfers contents of the error buffer in 
bits are always all Vs. Since only the 10 least significant the memory 12 of the I/O system to the host computer, 
bits of each word in the output device table are required The Error Buffer contents correspond to I/O channels 
for device address specification, the 6 most significant that did not pass the I/O test; I/O Test— provides an 
bits are used as a function code to specify the type, if interrupt to microcomputer 13 to execute a program for 
any, of processing to be performed on the data in each 30 testing each I/O channel on the system. The logic and 
word in the output data buffer before transferring it to switching components required in the I/O system inter- 
the specified output address. For example, one function face unit 11 are specific to the host computer used and 
code may instruct the microcomputer 13 to regard the can be readily constructed by those skilled in the art to 
number in an output buffer word as a floating point provide the functions described herein, 
number in the host computer's floating point format, 35 Associated with the microcomputer and its memory 
and to convert this number to a fixed point integer are a plurality of input/output (I/O) modules. Each of 
before transmitting it to its output device. Another these modules appears as a sequence of adaVesses m I/O 
function code may specify that the 16 most significant device block in the memory 12 of the I/O system 10. 
bits in an output data buffer be regarded as a value, x,, Each I/O module has the capabdity of coupkng jto 
i , # . 40 output back to its mput for performing a detailed "loop 
and that the function ^ ^ Qf ^ yQ ^ (jyo) 

Yi=a/X/+bi modules are of designs well known in the art and may 

include, but are not limited to the following types: digi- 

be computed where a/ and b/ are constants stored in a tal input module, digital output module, analog output 

table of constants in the memory, before Y/ is transmit- « module, analog input module, and peripheral driver 

ted to the output device. Another function code may -By pro^dmg each I/O 

require the microcomputer 13 to scale a certain number status address located 1024 words below the first ad- 
of significant bits of the associated data buffer word <■«?• ^ that module, it is possible to automati- 
oi 51 gniu C *m>. on* ui w c fii a . cally identify which I/O channels are actually popu- 
according to a specified formula ^h^nction ^speci- £ i/o modules and what the module type is. The 
fied by the six bit function code is performed ^ a sepa- ^J^,, word a ^ module identiflca . 
rate sub-routine specified by the function code. Thus fie which wfacn jdentifies ^ module 
additional processing functions may be added by adding The tion of ^ ch function performed by input- 
the required sub-routines in memory 12. The I/O sys- / t £ stem 1Q ^ nQW ^ dcscribed> 
tern may be made compatible with various host comput- 5J Q Datfl Processing . After ^ has been trac- 
ers that support high speed burst data transfers by suit- fcmd from the host computer t0 ^ e output da ta buffer 
able design of the I/O system interface unit 11. The I/O in the yQ system lfJ the host computer ( not shown) 
system interface unit 11 accepts bidirectional high speed issues m output processing interrupt signal to mi- 
burst data from the host computer and accepts the fol- crocomputer 13. 

lowing discrete control signals from the host computer: ^ Upon receipt of the output processing interrupt in the 

input acknowledge, output data ready, external func- microcomputer 13 proceeds as follows in accordance 

tion, input status acknowledge, last word flag, I/O re- w ; tn tnc output processing program in the Appendix. It 

set, terminated device. The I/O system interface unit reads the first word in the output device table. If the 
provides the following discrete control signals to the function code is not the "return'* code, it jumps to a 

host computer: input data ready, output acknowledge, 55 subroutine specified by the function code. This subrou- 
external function acknowledge, input status ready, ex- tine transfers the data in the first word of the output 
ternal terminate, external mode, device present and data buffer to the output address specified in the output 

device end of block. device table, after the required processing. The next 
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word in the output device table is then read and the 
process repeated. When a "return" function code is 
read, the process ends and the microcomputer returns 
to input processing. One function code is provided that 
allows a user to compose his own program or routine. 5 
When this code is encountered, the output processing 
program takes as the starting address of the user pro- 
gram, the most significant half of the corresponding 
word pair in the output data buffer. The feature allows 
the computational power of the microcomputer to aug- 10 
ment that of the host computer. 

Input Data Processing: Input processing takes place 
in a similar manner in accordance with the input pro- 
cessing program in the Appendix. Input data from pe- 
ripheral devices are read, processed by microcomputer ]5 
13 and transferred to an input data buffer whose ad- 
dresses are paired with addresses in an input device 
table. Input processing is continuous unless the mi- 
crocomputer 13 receives an interrupt signal from the 
I/O system interface unit 11. If an interrupt signal is 
received, microcomputer 13 executes the program spec- 
ified by the interrupt signal and then returns to input 
processing. Whenever input data is required by the host 
computer it initiates, by the direct memory access tech- 
nique, a transfer of input data from the input data buffer. 

I/O Testing: The I/O test is performed to verify the 25 
integrity of the I/O modules 15. I/O testing can be 
initiated by the host computer when the system is oper- 
ating or by a System Operator when the I/O system 10 
is off-line. 

When initiated, by a test interrupt signal the I/O test 30 
program clears any previous error indications in the 
Error Buffer. It then proceeds to test each I/O module, 
logging any errors encountered. As each module is 
tested, the external inputs or outputs are disconnected 
by relays either mechanical or electronic on the I/O 35 
modules. When testing of an output card is complete, 
the outputs are set to zero, and the external lines are 
reconnected. When all I/O modules have been tested, 
the test program supplies the 'Test Done" signal to the 
host computer, and exits back to the Input Processing 40 
mode. Errors are reported to the host computer by 
setting bits in the Error Buffer, a 64-word block of 
memory in memory 12. Each of the possible I/O chan- 
nels has a corresponding bit in the Error Buffer. 

System Monitor: The system monitor enables an op- 45 
erator to test and operate the programmable I/O system 
through a terminal device (not shown) while in an off- 
line mode. The capabilities of the system monitor in- 
clude displaying and modifying the contents of mem- 
ory, performing input/output testing, executing input- 5Q 
/output processing programs and entering and running 
user programs. 

While the invention has been described in its pre- 
ferred embodiments, it is to be understood that the 
words which have been used are words of description 
rather than of limitation and that changes may be made 55 
within the purview of the appended claims without 
departing from the true scope and spirit of the invention 
in its broader aspects. 

We claim: 

1. An input/output (I/O) system that selectively cou- 60 
pies a host computer to a plurality of external peripheral 
devices, said host computer having a memory with a 
plurality of locations for storing data to be transferred 
to said plurality of prhipheral devices, said I/O system 
comprising: 65 
digital computer means coupled to receive processing 
and test interrupt signals from said host computer 
for processing and distributing host computer sup- 



plied address, data and control signals to said plu- 
rality of external peripheral devices, 

memory means included in said digital computer 
means coupled to receive said address, data and 
control signals from said host computer for storage 
of said host computer supplied address, data and 
control signals to be transferred to said plurality of 
external peripheral devices, 

a device table in said memory means having a plural- 
ity of locations each location for storing a field 
which contains an address of a selected peripheral 
device and a function code which may be modified 
by said host computer during operation for specify- 
ing a sub-routine for processing said host computer 
supplied data by said digital computer means, 

a data buffer in said memory means having a plurality 
of locations paired with locations in said device 
table respectively for receiving address, data and 
control signals transferred in high speed bursts by 
said host computer, 

a device data block in said memory means having a 
plurality of locations corresponding to addresses in 
said device table, said locations in said data block 
being coupled to said external peripheral devices 
for the transfer thereto of host computer supplied 
data processed by said digital computer means, and 

a plurality of I/O modules each including an address 
circuit for coupling to said device data block loca- 
tions in said memory means. 

2. An input/output (I/O) system as recited in claim 1 
further including an I/O system interface means for 
coupling said host computer to said digital computer 
means, said I/O interface means configurable to inter- 
face selected host computers using high speed data 
burst transfer techniques to said digital computer 
means. 

3. A method for transferring data from a memory of 
a host computer to selected external peripheral devices 
comprising the steps of: 

pairing locations in a data buffer of an input/output 
(I/O) system with locations in a device table of said 
input/output (I/O) system, transferring in high 
speed bursts address, data and control signals from 
said memory of said host computer to said data 
buffer 

transmitting an output processing interrupt signal 
from said host computer to a digital computer 
means in said input/output (I/O) system causing 
said digital computer means to perform the follow- 
ing steps: 

reading a first word in said device table causing said 
digital computer means to transfer and process data 
stored in a first paired location in said data buffer, 
said device table words capable of being modified 
by said host computer during system operation, 

transferring processed data to an address in a device 
block in said input/output (I/O) system, said ad- 
dress specified by said first word in said device 
table 

transferring said processed data from an address in 
said device block to a selected peripheral device 

continuing to read in turn each word in said device 
table causing said digital computer means to trans- 
fer and process data stored in each paired location 
of said data buffer, transferring said processed data 
to specified device block addresses for transfer to 
selected external perhipheral devices, 

after reading all words in said device table said digital 
computer means resumes processing input signals 
from said external peripheral devices until receipt 
of next output processing interrupt signal from said 
host computer. 



